marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
126 stars 25 forks source link

issue with parsnp/1.5.4 #91

Open lanorvege opened 3 years ago

lanorvege commented 3 years ago

Hi, I'm currently trying to use parsnp v1.5.4 on a slurm cluster (with raxml v8.2.12, PhiPack/1.0, harvest-tools/1.3, FastTree/2.1.11) and I keep getting an error when trying to run parsnp on my 13 bacterial genomes. Any idea on how to fix this? here is wath I have on my terminal window:

srun -c 4 parsnp -d test_parsnp/ -r! -o test_parsnp/output_parsnp -v -c -p 32 -P 128000 |--Parsnp 1.5.4--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest 13:56:05 - INFO -


SETTINGS: |-refgenome: autopick |-genomes:
test_parsnp/EERA844_lgtfilt.fasta test_parsnp/Ec046_lgtfilt.fasta ...12 more file(s)... test_parsnp/EERA890_lgtfilt.fasta test_parsnp/Ec125_lgtfilt.fasta |-aligner: muscle |-outdir: test_parsnp/output_parsnp |-OS: Linux |-threads: 32


13:56:05 - INFO - <> 13:56:05 - INFO - No genbank file provided for reference annotations, skipping.. 13:56:05 - DEBUG - Sorting reference replicons 13:56:05 - DEBUG - Writing .ini file 13:56:05 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... 13:56:05 - DEBUG - /opt/gensoft/exe/parsnp/1.5.4/bin/parsnp_core test_parsnp/output_parsnp/parsnpAligner.ini 14:03:22 - CRITICAL - The following command failed:

$ /opt/gensoft/exe/parsnp/1.5.4/bin/parsnp_core test_parsnp/output_parsnp/parsnpAligner.ini Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.

  STDOUT:
  0

Ec046_lgtfilt.fasta.ref,Len:5089403,GC:50.7909 ... Finished processing input sequences, elapsed time: 3 seconds

             compressed suffix graph construction elapsed time: 0 seconds

             MUM anchor search elapsed time: 10 seconds

             compressed suffix graph construction elapsed time: 0 seconds

... Finished recursive MUM search, elapsed time: 1 seconds

    Finished filtering spurious matches, elapsed time: 0 seconds

    LCBs created, elapsed time: 0 seconds

  STDERR:

parsnpAligner:: rapid whole genome SNP typing


ParSNP: Preparing to construct global multiple alignment framework

Preparing to verify and process input sequences... Searching for initial MUM anchors...

    Constructing compressed suffix graph...
    Performing initial search for exact matches in the sequences...

... Performing recursive MUM search between MUM anchors... Filtering spurious matches... Creating and verifying final LCBs... Writing output files & aligning LCBs...

ERROR TreeFromSeqVect_UPGMA, CLUSTER_6 not supported

ERROR TreeFromSeqVect_UPGMA, CLUSTER_6 not supported

ERROR TreeFromSeqVect_UPGMA, CLUSTER_6 not supported

srun: error: task 0: Exited with exit code 2

valery-shap commented 2 years ago

Hello, @lanorvege I have the same issue. Have you solved this issue? Valery Udp the issue were solved by changed -p to much more lower value then the node has. the node has 94 cpus, I've set 30 cpus for parsnp not all

bkille commented 2 years ago

@valery-shap to clarify, you were observing the

*** ERROR *** TreeFromSeqVect_UPGMA, CLUSTER_6 not supported

issue as well, but resolved it by lowering the number of threads used?

valery-shap commented 2 years ago

No, I don't have this issue now , it successfully ended. seems that all (except one) my problems were because of this. errors:

  1. reference with 5 contigs. 1 chromosome and 4 plasmids. I got:

    Traceback (most recent call last):
    File "../bin/parsnp", line 1328, in <module>
    if block_spos < chr_spos:
    TypeError: '<' not supported between instances of 'int' and 'list'

    I changed the reference to one contig with only chromosome and got the other error:

  2. mkdir: cannot create directory ‘../blocks/’: File exists 10 seqs, max length 59, avg length 59

WARNING Assuming DNA (see -seqtype option), invalid letters found:

WARNING Assuming DNA (see -seqtype option), invalid letters found:

WARNING Assuming DNA (see -seqtype option), invalid letters found:

WARNING Assuming DNA (see -seqtype option), invalid letters found:

WARNING Assuming DNA (see -seqtype option), invalid letters found:

and like this

mkdir: cannot create directory /blocks/’: File exists
 10 seqs, max length 127, avg  length 127

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 
*** buffer overflow detected ***: /miniconda3/envs/parsnp/bin/bin/parsnp_core terminated

and have variants with cluster too

mkdir: cannot create directory ‘/blocks/’: File exists
 10 seqs, max length 59, avg  length 59

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

Alignment not completed, cannot save.

*** ERROR ***  TreeFromSeqVect_UPGMA, CLUSTER_6 not supported

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 
*** buffer overflow detected ***: /home/miniconda3/envs/parsnp/bin/bin/parsnp_core terminated

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

*** WARNING *** Assuming DNA (see -seqtype option), invalid letters found: 

I've found the files with K, M, Y symbols and removed them and I still have this error

then I've done test dir with ideal 5 genomes and run parsnp without slurm on another server but just in terminal, I've set 30 cpus and it worked! then I've added the genome with N symbol and it worked too then I've set 70 cpus (all cpus of this server) and I've got the error about: *** ERROR *** TreeFromSeqVect_UPGMA, CLUSTER_6 not supported and I've seen the same error on the slurm server when I've set all cpus of the node -p.

Now I've the final output folder with all output files: changed reference with one chromosome and including all genomes(with K,Y, W symbols), I see them in log file of parsnp (about Len and gc) but couldn't find on the tree. It is ALL looks very strange. I've used parsnp nearly year ago with hundred plasmids (only sequences of special plasmid) on the laptop and it worked excellent!

In the beginning I had this issue on slurm server with 750 gb ram too:

*** MAX MEMORY 4 MB EXCEEDED***
Memory allocated so far 16004 MB, physical RAM 680 MB
Use -maxmb <n> option to increase limit, where <n> is in MB.

There is no such flag. The command all the time was: parsnp -r ref.fasta -d genomes_dir -o output_dir -c -x -p different values tried to set -P too, but finally it worked without it version of parsnp from conda 1.5.6

Valery