marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
123 stars 25 forks source link

gziped files #155

Open jsgounot opened 3 weeks ago

jsgounot commented 3 weeks ago

Hi,

would it be possible to allow parsnp to handle gzipped compressed files? This would save some processing and disk space when running parsnp on a lot of genomes that are already compressed.

I initially thought I could update the main python script, but it seems that a change needs to be made in the core algorithm as well. During that time, I noticed that the sequence lengths are miscalculated multiple times for multi-fasta files, like here, where other headers are used in the reference length calculation.

Regards, JS