lmrodriguezr / nonpareil

Estimate metagenomic coverage and sequence diversity
http://enve-omics.ce.gatech.edu/nonpareil/
Other
44 stars 11 forks source link

Infinite loop when subsampling more reads than available with -T kmer #20

Closed lmrodriguezr closed 7 years ago

lmrodriguezr commented 7 years ago

When using -T kmer and the number of requested query reads (-X) is greater than the number of reads in the entire set, Nonpareil stays in an eternal loop (e.g., when using -s test/test.fasta). It should instead fail with an informative error or (even better) adjust -T to the total number of reads in the set.

gunturus commented 7 years ago

I fixed the bug. Before we run the algorithm, it will check the number of sequences in the metagenome. It will make sure to have 10 times query sequences. I chose to put a lower limit at 10 times the query sequence because we are randomly picking sequences. If the file as only 1010 sequences and the number of query sequences is 1000, it would be hard to find sequences which haven't been used by random chance. I don't think this case will occur unless we are testing. I am assuming users will have at least 100000 sequences in their metagenome.