marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
128 stars 25 forks source link

Error command harvest failed #89

Open valeriaR89 opened 3 years ago

valeriaR89 commented 3 years ago

Dear developers, I'm trying to use your tools for core SNP analysis. I use a native Ubuntu 20.04, the tutorial works fine but when I run my samples I have this error message. Before the error message seems that everything works (the reference and the genomes were recognised).


command line: parsnp -g /storage/riferimento/reference.gbk -d /storage/AssemblyPlino/ -v

|--Parsnp v1.2--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest


SETTINGS: |-refgenome: /storage/riferimento/reference.gbk.fna |-aligner: libMUSCLE |-seqdir: /storage/AssemblyPlino/ |-outdir: /home/user/P_2020_12_23_110653922704 |-OS: Linux |-threads: 32


<>

-->Reading Genome (asm, fasta) files from /storage/listeria_latina/caso_goretti/prova_snp/AssemblyPlino/.. |->[OK] -->Reading Genbank file(s) for reference (.gbk) /storage/listeria_latina/caso_goretti/prova_snp/riferimento/reference.gbk.. |->[OK] -->Calculating MUMi..

[...]

ParSNP: Preparing to construct global multiple alignment framework

Preparing to verify and process input sequences... Searching for initial MUM anchors...

    Constructing compressed suffix graph...
    Performing initial search for exact matches in the sequences...

Performing recursive MUM search between MUM anchors... Filtering spurious matches... Creating and verifying final LCBs... Writing output files & aligning LCBs... Parsnp: Finished core genome alignment |->[OK] -->Running PhiPack on LCBs to detect recombination.. |->[SKIP] ERROR The following command failed:

/tmp/_MEIHlF43E/harvest -q -o /home/P_2020_12_23_110653922704/parsnp.ggr -x /home/P_2020_12_23_110653922704/parsnp.xmfa -g /storage/riferimento/reference.gbk Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team. ERROR


If I run the message I have this result:


-bash: /tmp/_MEIHlF43E/harvest: No such file or directory


if I check the folder:


ls -l /tmp/_MEIHlF43E/ total 0 lrwxrwxrwx 1 user user 33 Dec 23 11:06 harvest -> /tmp/_MEIHlF43E/bin/harvest_linux lrwxrwxrwx 1 user user 29 Dec 23 11:06 nucmer -> /tmp/_MEIHlF43E/MUMmer/nucmer lrwxrwxrwx 1 user user 26 Dec 23 11:06 parsnp -> /tmp/_MEIHlF43E/bin/parsnp lrwxrwxrwx 1 user user 33 Dec 23 11:06 phiprofile -> /tmp/_MEIHlF43E/bin/Profile_linux lrwxrwxrwx 1 user user 34 Dec 23 11:06 show-coords -> /tmp/_MEIHlF43E/MUMmer/show-coords


but the folder /tmp/_MEIHlF43E/bin/ does not exist.

How can I solve the problem? Thank you for your help!

Valeria

bkille commented 3 years ago

Hi @valeriaR89, thanks for using Parsnp and reporting this issue! Unfortunately Parsnp v1.2 is no longer supported, so I can't help much with the issue you are currently facing. However, I have seen similar errors that have been fixed by updating to the most recent version of Parsnp.

Parsnp v1.5.4 is available via conda (recommended) as well as through GitHub releases. If you have conda installed on your system, you can install Parsnp by first adding the bioconda channel and then running conda install -c bioconda parsnp

valeriaR89 commented 3 years ago

Thank you for your answer, and happy new year! I tried to install parsnp via conda following your advice, but in my personal laptop (WSL2) I had this error message "failed with repodata from current_repodata.json, will retry with next repodata source." and it still ask me to install the version 1.2.

In another machine (native linux) I manage to install via conda the version 1.5.3. (using the same way) but now I had the same problem of the issues #88, because the generated parsnp.snps.mblocks is empty and the run stucks. The other file seems to be ok.


the error message:

CRITICAL - The following command failed:

$ raxmlHPC-PTHREADS -m GTRCAT -p 12345 -T 8 -s /storage/output/parsnp.snps.mblocks -w /tmp/tmpa4avt5x5 -n OUTPUT Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.

  STDOUT:
  Warning, you specified a working directory via "-w"

Keep in mind that RAxML only accepts absolute path names, not relative ones!

RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file

TOO FEW SPECIES


The output file list:

ls output/

-rw-rw-r-- 1 users 147 Jan 12 09:36 all.mumi -rw-rw-r-- 1 users 1.9K Jan 12 09:36 all_mumi.ini drwxrwsr-x 693 users 16K Jan 12 09:37 blocks/ -rw-rw-r-- 1 users 1.9K Jan 12 09:36 parsnpAligner.ini -rw-rw-r-- 1 users 3.8K Jan 12 09:38 parsnpAligner.log -rw-r--r-- 1 users 83 Jan 12 09:38 parsnp.ggr -rw-rw-r-- 1 users 2.2K Jan 12 09:38 parsnp.rec -rw-rw-r-- 1 users 0 Jan 12 09:38 parsnp.snps.mblocks -rw-rw-r-- 1 users 35M Jan 12 09:38 parsnp.xmfa -rw-rw-r-- 1 users 1.9K Jan 12 09:36 psnn.ini drwxrwsr-x 2 users 4.0K Jan 12 09:36 tmp/


Thank you

Valeria

bkille commented 3 years ago

@valeriaR89

I tried to install parsnp via conda following your advice, but in my personal laptop (WSL2) I had this error message "failed with repodata from current_repodata.json, will retry with next repodata source." and it still ask me to install the version 1.2.

This most likely means you are missing the bioconda channel setup for conda. Running the commands below should allow you to be able to install the most recent Parsnp version via conda:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

With respect to error mentioned. Could you rerun the same command but with the --verbose flag and attach the output here?

valeriaR89 commented 3 years ago

This is the output:

parsnp -g /storage/snp/riferimento/reference.gbk -d /storage/snp/Assembly/ -o /storage/snp/output -q /storage/snp/Assembly/seq1.fna --verbose -p 8

|--Parsnp 1.5.3--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest 10:16:58 - INFO -


SETTINGS: |-refgenome: /storage/snp/riferimento/reference.gbk |-genomes: /storage/snp/Assembly/seq1.fna /storage/snp/Assembly/seq2.fna ...15 more file(s)... /storage/snp/Assembly/seq18.fna /storage/snp/Assembly/seq19.fna |-aligner: muscle |-outdir: /storage/snp/output |-OS: Linux |-threads: 8


10:16:58 - INFO - <> 10:16:58 - DEBUG - Writing .ini file 10:16:58 - INFO - Recruiting genomes... 10:16:58 - DEBUG - /home/miniconda3/bin/bin/parsnp_core /storage/snp/output/all_mumi.ini 10:17:26 - DEBUG - 0 reference.gbk.fna,Len:2944528,GC:37.981 seq1.fna,Len:2975839,GC:37.9406 seq2.fna,Len:2985834,GC:37.9547 seq3.fna,Len:2990161,GC:38.0781 seq4.fna,Len:2919659,GC:37.9151 seq5.fasta,Len:2847393,GC:37.954 seq6.fasta,Len:2874584,GC:37.9796 seq7.fasta,Len:2871705,GC:38.0356 seq8.fna,Len:2922961,GC:38.0189 seq9.fna,Len:2952686,GC:37.948 seq10.fasta,Len:2889168,GC:38.0001 seq11.fna,Len:3131413,GC:38.0227 seq12.fna,Len:2982833,GC:38.0007 seq13.fna,Len:2974866,GC:37.8808 seq14.fasta,Len:2881583,GC:37.9931 seq15.fna,Len:2961987,GC:37.9747 seq16.fna,Len:2920369,GC:37.9095 seq17.fasta,Len:2859587,GC:37.9685 seq18.fna,Len:2931744,GC:37.9568 seq19.fna,Len:2967347,GC:38.0059 Finished processing input sequences, elapsed time: 2 seconds

             Compressed suffix graph construction elapsed time: 0 seconds

/storage/snp/output /storage/snp/output/all.mumi MUMi pairwise distance calculation finished: 22 seconds

10:17:26 - DEBUG -


parsnpAligner:: rapid whole genome SNP typing


ParSNP: Preparing to construct global multiple alignment framework

Preparing to verify and process input sequences... Calculating mumi distances..

    Constructing compressed suffix graph...
    Calculting pairwise MUMi distances...

10:17:26 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... 10:17:26 - DEBUG - /home/miniconda3/bin/bin/parsnp_core /storage/snp/output/parsnpAligner.ini 10:19:15 - DEBUG - 0 reference.gbk.fna,Len:2944528,GC:37.981 seq1.fna,Len:2975839,GC:37.9406 seq2.fna,Len:2985834,GC:37.9547 seq3.fna,Len:2990161,GC:38.0781 seq4.fna,Len:2919659,GC:37.9151 seq5.fasta,Len:2847393,GC:37.954 seq6.fasta,Len:2874584,GC:37.9796 seq7.fasta,Len:2871705,GC:38.0356 seq8.fna,Len:2922961,GC:38.0189 seq9.fna,Len:2952686,GC:37.948 seq10.fasta,Len:2889168,GC:38.0001 seq11.fna,Len:3131413,GC:38.0227 seq12.fna,Len:2982833,GC:38.0007 seq13.fna,Len:2974866,GC:37.8808 seq14.fasta,Len:2881583,GC:37.9931 seq15.fna,Len:2961987,GC:37.9747 seq16.fna,Len:2920369,GC:37.9095 seq17.fasta,Len:2859587,GC:37.9685 seq18.fna,Len:2931744,GC:37.9568 seq19.fna,Len:2967347,GC:38.0059 Finished processing input sequences, elapsed time: 2 seconds

             compressed suffix graph construction elapsed time: 4 seconds

             MUM anchor search elapsed time: 53 seconds

    Finished recursive MUM search, elapsed time: 15 seconds

    Finished filtering spurious matches, elapsed time: 0 seconds

    LCBs created, elapsed time: 20 seconds

    Output files updated, elapsed time: 12 seconds

    See log file for futher details. Total processing time: 109 seconds

10:19:15 - DEBUG -


parsnpAligner:: rapid whole genome SNP typing


ParSNP: Preparing to construct global multiple alignment framework

Preparing to verify and process input sequences... Searching for initial MUM anchors...

    Constructing compressed suffix graph...
    Performing initial search for exact matches in the sequences...

Performing recursive MUM search between MUM anchors... Filtering spurious matches... Creating and verifying final LCBs... Writing output files & aligning LCBs... Parsnp: Finished core genome alignment

10:19:15 - DEBUG - harvesttools -q -o /storage/snp/output/parsnp.ggr -x /storage/snp/output/parsnp.xmfa -g /storage/snp/riferimento/reference.gbk 10:19:15 - DEBUG - 10:19:15 - DEBUG - 10:19:15 - DEBUG - harvesttools -q -i /storage/snp/output/parsnp.ggr -S /storage/snp/output/parsnp.snps.mblocks 10:19:15 - DEBUG - 10:19:15 - DEBUG - 10:19:15 - INFO - Reconstructing core genome phylogeny... 10:19:15 - DEBUG - raxmlHPC-PTHREADS -m GTRCAT -p 12345 -T 8 -s /storage/snp/output/parsnp.snps.mblocks -w /tmp/tmplpw5ktct -n OUTPUT 10:19:15 - CRITICAL - The following command failed:

$ raxmlHPC-PTHREADS -m GTRCAT -p 12345 -T 8 -s /storage/snp/output/parsnp.snps.mblocks -w /tmp/tmplpw5ktct -n OUTPUT Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.

  STDOUT:
  Warning, you specified a working directory via "-w"

Keep in mind that RAxML only accepts absolute path names, not relative ones!

RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file

TOO FEW SPECIES

  STDERR:

I notice that in the latest run also the folder "blocks" was empty

bkille commented 3 years ago

@valeriaR89 Thanks for running that. Looking at the output its not immediately clear what's causing the issue. Would you mind running ls -lh /storage/snp/output ? I'd like to see which files are being created/populated.

valeriaR89 commented 3 years ago

ls -lh /storage/snp/output

-rw-rw-r--  1   users   219 Jan 19  08:48   all.mumi
-rw-rw-r--  1   users   2.5K    Jan 19  08:48   all_mumi.ini
drwxrwsr-x  2   users   4.0K    Jan 19  08:48   blocks
-rw-rw-r--  1   users   2.5K    Jan 19  08:48   parsnpAligner.ini
-rw-rw-r--  1   users   5.0K    Jan 19  08:50   parsnpAligner.log
-rw-rw-r--  1   users   0   Jan 19  08:50   parsnp.snps.mblocks
-rw-rw-r--  1   users   50M Jan 19  08:50   parsnp.xmfa
-rw-rw-r--  1   users   2.5K    Jan 19  08:48   psnn.ini
drwxrwsr-x  2   users   4.0K    Jan 19  08:48   tmp

the folder "tmp" contain the reference sequence reference.gbk.fna, while the folder "blocks" is empty

bkille commented 3 years ago

Hi,

It looks like the parsnp.ggr file is missing, which is causing the .mblocks file to be empty resulting in a failed RAxML run. If your intermediate files are still there, can you try running this command harvesttools -q -o /storage/snp/output/parsnp.ggr -x /storage/snp/output/parsnp.xmfa -g /storage/snp/riferimento/reference.gbk and see if the .ggr file appears (also please attach any output from harvesttools)