wdecoster / nanocomp

Comparison of multiple long read datasets
MIT License
113 stars 9 forks source link

Compare reads in two FASTQ files based on ReadID #77

Closed LogCrab closed 7 months ago

LogCrab commented 7 months ago

Hi @wdecoster I am interested in the pairwise identity between each read in two FASTQ both basecalled on the same FAST5 files but with different software, such as Guppy or Dorado. This work was to compared the similarity between different basecalling software. I think it is theoretically possible because the ReadID is just in the header of each FASTQ. But after extensive search, I found no software can do the job. Is it possible to add this function in NanoComp? BTW, any plan on rewritting NanoComp in Rust? Have a nice day!

wdecoster commented 7 months ago

That is a highly specific task, and I am reluctant to support it in NanoComp. Would your prefered output be a table with a column of read ID and the pairwise identity? I guess you could iterate over both fastqs, and use mappy (https://github.com/lh3/minimap2/tree/master/python) to align each read to its counterpart, and then use the NM tag to get the identity?

LogCrab commented 7 months ago

Thank you for your instruction.