voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
588 stars 134 forks source link

How many reads were used / not used in the assembly? #247

Closed tseemann closed 4 years ago

tseemann commented 4 years ago

We were wondering if megahit can output some statistics about how many of the input reads it actually used in the assembly, and how many it didn't use?

This would be useful to know, and to help QC the data provided. We understand we can align the original reads back to the contigs/graph but were wondering if megahit can tell us instead.

CC: @milot-mirdita

voutcn commented 4 years ago

I don't think we can provide a meaningful stat since reads had been decomposed into k-mers. Perhaps I can provide the number of reads that do not contribute any solid k-mers the smallest k. But even solid k-mers can be removed by the graph cleaning process.

tseemann commented 4 years ago

@voutcn thanks for considering this question, i thought that might be the case! thank you.