marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
123 stars 25 forks source link

Parsnp skips genome without aligning all #122

Closed emmannaemeka closed 8 months ago

emmannaemeka commented 1 year ago

Hello, I am trying to align my genomes. Parsnp identifies that there are 30 fasta files in the folder but only aligns 7. How can I change this?

15:39:18 - INFO - |--Parsnp 1.7.4--|

Ref /Users/emmannaemeka/Documents/benson/phy/parsnp/ref/bongori.fasta 15:40:25 - INFO -


SETTINGS: |-refgenome: /Users/emmannaemeka/Documents/benson/phy/parsnp/ref/bongori.fasta |-genomes:
/Users/emmannaemeka/Documents/benson/phy/parsnp/gen/Oranienburg_BCW_3977.fasta /Users/emmannaemeka/Documents/benson/phy/parsnp/gen/Molade_PNCS013226.fasta ...30 more file(s)... /Users/emmannaemeka/Documents/benson/phy/parsnp/gen/Oranienburg_CFSAN039536.fasta /Users/emmannaemeka/Documents/benson/phy/parsnp/gen/22-120333_S16.fasta |-aligner: muscle |-outdir: benson_new_ref_phy |-OS: Darwin |-threads: 1


15:40:25 - INFO - <> 15:40:25 - INFO - No genbank file provided for reference annotations, skipping.. 15:40:28 - INFO - Recruiting genomes... 15:43:49 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... 15:46:12 - INFO - Reconstructing core genome phylogeny... 15:46:17 - INFO - Aligned 7 genomes in 6.97 minutes 15:46:17 - INFO - Parsnp finished! All output available in benson_new_ref_phy

pushtimeta commented 1 year ago

How are the genomes selected from the input directory? Not all of the genomes I wanted included are in the tree! By default, parsnp calculates the MUMi distance between the reference and each of the genomes in the genome directory. All genomes with MUMi distance <= 0.01 are included, all others are discarded. To force all genomes present in the genome dir to be included simply include ‘-c’ as a command-line parameter.

link - [https://harvest.readthedocs.io/en/latest/content/parsnp/faq.html]