gziped files - Githubissues

marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.

Other

123 stars 25 forks source link

Hi,

would it be possible to allow parsnp to handle gzipped compressed files? This would save some processing and disk space when running parsnp on a lot of genomes that are already compressed.

I initially thought I could update the main python script, but it seems that a change needs to be made in the core algorithm as well. During that time, I noticed that the sequence lengths are miscalculated multiple times for multi-fasta files, like here, where other headers are used in the reference length calculation.

Regards, JS

marbl / parsnp

gziped files #155