Closed nick-youngblut closed 4 years ago
Any thoughts on this issue? It's still a problem
I'm guessing the problem is resulting somewhere from within FastqReader::getRandomSeq
@nick-youngblut I have downloaded SRR061182. It seems the 15th base is always 'N' in all the sequences. So nonpareil runs forever until it finds viable sequence in this case we have no good sequences
@gunturus thanks for checking that! Maybe it would be helpful to have a break condition in the code in which the loop breaks after a user-specified number of tries?
Solved by c155bd134809339fe4ed994975912b6742646d3b
I'm running
Nonpareil v3.303
, and for one particular sample in a publicly available metagenome (accession SRS015133), nonpareil runsPicking N random sequences
forever (I've let it run for many hours). The other samples in that bioproject work just fine.My command:
Even if I use
-X 100
, the command runs forever. If I append reads from another sample,nonpareil
runs successfully. It appears thatnonpareil
is running a while loop to randomly select sequences, and breaks the loop once enough "good" sequences are found. However, in this case, nonpareil is never able to find enough "good" sequences in sample SRS015133. Reducing the kmer size (-k
) to 14 allows the nonpareil to finish successfully, but raising the length any higher will cause an infinite loop. Why is nonpareil having a problem with a kmer length of 24? I don't see any cutoffs for sequence quality.Here's the first couple of reads from the fastq (No. of reads = 1 mil):