marbl / harvest

Other
52 stars 11 forks source link

ignoring sequences ending with the same numbers as the reference genome #21

Open aldertzomer opened 8 years ago

aldertzomer commented 8 years ago

I noticed this odd little bug in parsnp v1.2. When running parsnp using the -c -d options and a reference ending with numbers, genomes that have names that are contained in the ending numbers of that reference file get excluded.

For example: when using the reference "H4476.fasta" , the genomes 6.fasta, 76.fasta and 476.fasta get silently ignored (they are not listed in the ini files). When I rename these three genomes to bla6bla.fasta , bla76bla.fasta and bla476bla.fasta they do get included. I'm assuming this is some sort of bug in the code that excludes the reference sequence from being selected as a query genome.

treangen commented 8 years ago

hi aldertzomer,

I'm assuming this is some sort of bug in the code that excludes the reference sequence from being >selected as a query genome.

thanks for opening this issue. this is exactly what is happening. Fix for this is on the way; a temporary workaround would be to rename the query genomes, as you've suggested.