marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
123 stars 25 forks source link

parsnp adding SNP and different SNP calls between runs #112

Open robfenton opened 2 years ago

robfenton commented 2 years ago

We have 41 complete, closed, Salmonella sequences (40 unique). For a reference to get annotations, we exported one (1956) as both a genbank file, and a fasta file. Looking at the output in gingr (see screenshots) there have been several nucleotides changed from what is in the genbank and fasta.

After verifying the sequences involved, we re-ran the analysis with no changes, and received a slightly different output (see screenshots, isolate 09578-19-1).

1956_Chromosome.gb contains the GenBank record for the chromosome, and 1956.fasta contains both the chromosome and plasmid sequences. The chromosome sequences from both files is identical.

Using 1.7.2 installed via conda Ubuntu 20.04 LTS in VirtualBox on Windows 10. ginger 1.3 from the Linux64-v1.3 tarball GenBank and fasta files all produced by Geneious 2022.1.1 Annotated with PGAP (2021-01-11.build5132)

The only options used were -p 8, -g genbankfile.gb, and -d ./oursequences

Screenshot from 2022-04-28 11-05-09 Screenshot from 2022-04-28 15-23-40

bkille commented 2 years ago

Hi @robfenton! Thank you for opening an issue and also for detailing the version, command, and OS. Just to clarify, the only difference I am seeing is that sequence 09578-19-1.fasta seems to be missing all of the variants in the first figure.

In order to to track this bug now, we'll first need to determine if it came from parsnp, harvest-tools, or gingr. Would you be able to share the .xmfa output of your analysis? You're welcome to email the output to brycekille@gmail.com as opposed to posting it here if you'd prefer.

robfenton commented 2 years ago

That's not quite it. 1956.fasta and 1956_Chromosome.gb.fna show different sequences, even though they're supposed to be exactly the same.

I'll double check with my boss on Monday and send both .xmfa output files if I can.

Thanks!