marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
126 stars 25 forks source link

The following command failed: >>$ /var/miniconda3/envs/parsnp/bin/bin/parsnp_core #100

Closed bakersjc closed 2 years ago

bakersjc commented 2 years ago

Hi,

I am trying to run parsnp to align ~2300 bacterial genomes. However I am getting a critical error resulting in failure of Parsnp_aligner.ini. In the error log, I get a WARNING Assuming DNA (see -seqtype option), invalid letters found:, and an eventual ERROR Invalid seq type that I believe is leading to the eventual error terminating the run. I have run a script to check my genomes for letters other than ACTGUN and removed any containing other letters. I am running parsnp using a .fna reference genome, and my genomes to be aligned contain a mixture of .fna and .fasta files. Is there a way to identify which sequence(s) is/are causing the Invalid seq type error?

bkille commented 2 years ago

Hi @bakersjc, thanks for opening an issue!

Could you provide the command used to run parsnp as well as the entire output?

-Bryce

bakersjc commented 2 years ago

@bkille parsnp -r -d <directory containing genomes in .fna and .fasta format> -p 80 -o (output directory) -x -v -c

Output reads:

15:29:01 -  [1;37mINFO [0m - <<Parsnp started>>
15:29:01 -  [1;37mINFO [0m - No genbank file provided for reference annotations, skipping..
15:31:22 -  [1;34mDEBUG [0m - Sorting reference replicons
15:31:22 -  [1;34mDEBUG [0m - Writing .ini file
15:31:22 -  [1;37mINFO [0m - Running Parsnp multi-MUM search and libMUSCLE aligner...
15:31:22 -  [1;34mDEBUG [0m - /var/miniconda3/envs/parsnp/bin/bin/parsnp_core C_diff_parsnp_8/parsnpAligner.ini
17:34:49 -  [1;31mCRITICAL [0m - The following command failed:
      >>$ /var/miniconda3/envs/parsnp/bin/bin/parsnp_core C_diff_parsnp_8/parsnpAligner.ini
      Please veryify input data and restart Parsnp.
      If the problem persists please contact the Parsnp development team.

Then the filenames of all the sequences, followed by:

        Finished processing input sequences, elapsed time: 258 seconds

                 compressed suffix graph construction elapsed time: 4 seconds

                 MUM anchor search elapsed time: 4222 seconds

        Finished recursive MUM search, elapsed time: 2490 seconds

        Finished filtering spurious matches, elapsed time: 0 seconds

        LCBs created, elapsed time: 0 seconds

      STDERR:

*****************************************************

parsnpAligner:: rapid whole genome SNP typing

*****************************************************

ParSNP: Preparing to construct global multiple alignment framework

Preparing to verify and process input sequences...
Searching for initial MUM anchors...

        Constructing compressed suffix graph...
        Performing initial search for exact matches in the sequences...
Performing recursive MUM search between MUM anchors...
Filtering spurious matches...
Creating and verifying final LCBs...
Writing output files & aligning LCBs...
mkdir: cannot create directory ‘C_diff_parsnp_8/blocks/’: File exists

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** ERROR ***  Invalid seq type

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 
bkille commented 2 years ago

@bakersjc sorry for the delay, was away on holiday. Could you by any chance attach the full output of parsnp? That might help elucidate some issues.

Best, Bryce

bakersjc commented 2 years ago

@bkille Attached is the full parsnp output: Thanks! parsnpAligner.STDERR.docx

bakersjc commented 2 years ago

@bkille some more information: I tried splitting up my input files into batches to try to figure out which of the files was causing the error and I still get the The following command failed:>$ /var/miniconda3/envs/parsnp/bin/bin/parsnp_core C_diff_parsnp_Merged/parsnpAligner.ini Please veryify input data and restart Parsnp. error for each batch so I am not sure what is causing the error. I have tried using a reference genome that is both in .fna and .fasta format, same result each time. Any help would be greatly appreciated. Thanks!

bakersjc commented 2 years ago

@bkille I have solved the issue - my fasta reference file contained both a chromosome and a plasmid, and it appears that parsnp did not support this. Using a .gbk file fixed the problem.

bkille commented 2 years ago

Ahh I see. Well I'm glad you were able to figure it out and thanks for letting me know!

-Bryce

valery-shap commented 2 years ago

Hello, @bakersjc and @bkille, I have the same issue. Could you please comment a bit more about plasmids and chromosome in the reference file? Should the reference file have only chromosome contig? does parsnp identify replicon and non-replicon contigs? and what will happen if the reference file is the short read only assembly and there is no circular full length plasmids that could be dropped all? @bakersjc did work only gbk format file with the chromosome contig only or fasta file with one contig too? Valery Upd. I've made the reference fasta file with one chromosome contig only and still have this issue. Could parsnp work only with ACTG symbols?