marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
129 stars 25 forks source link

Aligned regions cover less than 1% of reference genome, something is not right #162

Closed Arthurdyu closed 3 weeks ago

Arthurdyu commented 1 month ago

13:17:08 - INFO - |--Parsnp 2.0.5--|

13:17:08 - INFO -


SETTINGS: |-refgenome: /share/home/wanglamei/Data/t10sal/fasta10/GCA_000003645.1_ASM364v1_genomic.fasta |-genomes:
/share/home/wanglamei/Data/t10sal/fasta10/GCA_000008165.1_ASM816v1_genomic.fasta /share/home/wanglamei/Data/t10sal/fasta10/GCA_000008005.1_ASM800v1_genomic.fasta ...6 more file(s)... /share/home/wanglamei/Data/t10sal/fasta10/GCA_000003955.1_ASM395v1_genomic.fasta /share/home/wanglamei/Data/t10sal/fasta10/GCA_000008445.1_ASM844v1_genomic.fasta |-aligner: muscle |-outdir: /share/home/wanglamei/ParSNP/P_2024_09_25_131708925604 |-OS: Linux |-threads: 10


13:17:08 - INFO - <> 13:17:08 - INFO - No genbank file provided for reference annotations, skipping.. 13:17:08 - WARNING - File /share/home/wanglamei/Data/t10sal/fasta10/GCA_000008425.1_ASM842v1_genomic.fasta is 1.25x shorter than reference genome! 13:17:09 - INFO - Too few genomes to run partitions of size >50. Running all genomes at once. 13:17:09 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... 13:18:14 - CRITICAL - Aligned regions cover less than 1% of reference genome, something is not right Adjust params and rerun. If issue persists please submit a GitHub issue

real 1m5.678s user 1m5.894s sys 0m1.797s

Arthurdyu commented 1 month ago

My commands are as follows:

dir="/share/home/wanglamei/Data/t10sal/fasta10" time parsnp -r ${dir}/GCA_000003645.1_ASM364v1_genomic.fasta -d ${dir} -p 10 -c -x

bkille commented 1 month ago

Hi @Arthurdyu,

Do you run into the same issue if you omit the -c flag? My guess is that one (or more) of the input assemblies is throwing off the core-genome identification.

Best, Bryce

Arthurdyu commented 1 month ago

Thanks a lot ! @bkille Actually, I want to filter and reconstitute the region. In the old version, -x needs to be used. However, in the new version, I don't know if it can still be used. The following are the differences of -x in the old and new versions: Version: 1.2 ''' <> -x = : enable filtering of SNPs located in PhiPack identified regions of recombination? (default: NO) ''' Version: 2.05 ''' Misc: --skip-phylogeny Do not generate phylogeny from core SNPs --validate-input Use Biopython to validate input files --use-fasttree Use fasttree instead of RaxML --vcf Generate VCF file. --no-maf Do not generage MAF file (XMFA only) -p THREADS, --threads THREADS Number of threads to use -P MAX_PARTITION_SIZE, --max-partition-size MAX_PARTITION_SIZE Max partition size (limits memory usage) -v, --verbose Verbose output -x, --xtrafast -i INIFILE, --inifile INIFILE, --ini-file INIFILE -e, --extend --no-recruit -V, --version show program's version number and exit '''

bkille commented 3 weeks ago

Thanks for opening #163 (and sorry for forgetting about this one). Feel free to close this issue if the only remaining issue is the recombination filter.