jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
346 stars 81 forks source link

Stopping in STEP5 #847

Open guianrey opened 3 weeks ago

guianrey commented 3 weeks ago

Hi jtamames

Could you help with this issue, I don't understand what is happening, the database is well installed, I have 128 cores with 256 GB RAM in a cluster to run this pipeline (SqueezeMeta v1.6.3, September 2023).

[52 seconds]: STEP5 -> HMMER/PFAM: 05.run_hmmer.pl Running HMMER3 (Eddy 2009, Genome Inform 23, 205-11) for Pfam Error running command: /home/guillermo.reyes/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/hmmer/hmmsearch --domtblout /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/intermediate/05.Mar_ingreso_M1.pfam.hmm -E 1e-10 --cpu 128 /home/guillermo.reyes/SqueezeDataBase/db/Pfam-A.hmm /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa > /dev/null 2>&1 at /home/guillermo.reyes/miniconda3/envs/SqueezeMeta/SqueezeMeta/scripts/05.run_hmmer.pl line 31. Stopping in STEP5 -> 05.run_hmmer.pl. Program finished abnormally


I checked the installed database with test_install.pl

Scalar value @args[-1] better written as $args[-1] at /home/guillermo.reyes/miniconda3/envs/SqueezeMeta/bin/test_install.pl line 208.

Checking the OS linux OK

Checking that tree is installed tree --help OK

Checking that ruby is installed ruby -h OK

Checking that java is installed java -h OK

Checking that all the required perl libraries are available in this environment perl -e 'use Term::ANSIColor' OK perl -e 'use DBI' OK perl -e 'use DBD::SQLite::Constants' OK perl -e 'use Time::Seconds' OK perl -e 'use Tie::IxHash' OK perl -e 'use Linux::MemInfo' OK perl -e 'use Getopt::Long' OK perl -e 'use File::Basename' OK perl -e 'use DBD::SQLite' OK perl -e 'use Data::Dumper' OK perl -e 'use Cwd' OK perl -e 'use XML::LibXML' OK perl -e 'use XML::Parser' OK perl -e 'use Term::ANSIColor' OK

Checking that all the required python libraries are available in this environment python3 -h OK python3 -c 'import numpy' OK python3 -c 'import scipy' OK python3 -c 'import matplotlib' OK python3 -c 'import dendropy' OK python3 -c 'import pysam' OK python3 -c 'import Bio.Seq' OK python3 -c 'import pandas' OK python3 -c 'import sklearn' OK python3 -c 'import nose' OK python3 -c 'import cython' OK python3 -c 'import future' OK

Checking that all the required R libraries are available in this environment R -h OK R -e 'library(doMC)' OK R -e 'library(ggplot2)' OK R -e 'library(data.table)' OK R -e 'library(reshape2)' OK R -e 'library(pathview)' OK R -e 'library(DASTool)' OK R -e 'library(SQMtools)' OK

Checking binaries spades.py OK metabat2 OK jgi_summarize_bam_contig_depths OK samtools OK bwa OK minimap2 OK diamond OK hmmsearch OK cd-hit-est OK kmer-db OK aragorn OK mothur OK

Checking that SqueezeMeta is properly configured... checking database in /home/guillermo.reyes/SqueezeDataBase/db nr.db OK CheckM manifest OK LCA_tax DB OK

All checks successful

fpusan commented 3 weeks ago

Your installation seems to be fine and we don't usually have problems with step 5.

You mentioned that you are running this in a cluster. Any chance your process just ran out of time and was killed by the workload manager? (although the output you pasted says that step 05 started at 52 seconds, maybe this was already a restart?)

The hmmsearch binary seems to be loading well according to test_install.pl but I would still like to test it with your data. What is the output of running the following command?

/home/guillermo.reyes/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/hmmer/hmmsearch --domtblout /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/intermediate/05.Mar_ingreso_M1.pfam.hmm -E 1e-10 --cpu 128 /home/guillermo.reyes/SqueezeDataBase/db/Pfam-A.hmm /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa ?

guianrey commented 3 weeks ago

Hi fpusan

Here is the result

(SqueezeMeta) guillermo.reyes@dgx-node-0-0:~/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData$ /home/guillermo.reyes/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/hmmer/hmmsearch --domtblout /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/intermediate/05.Mar_ingreso_M1.pfam.hmm -E 1e-10 --cpu 128 /home/guillermo.reyes/SqueezeDataBase/db/Pfam-A.hmm /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa

hmmsearch :: search profile(s) against a sequence database

HMMER 3.1b2 (February 2015); http://hmmer.org/

Copyright (C) 2015 Howard Hughes Medical Institute.

Freely distributed under the GNU General Public License (GPLv3).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

query HMM file: /home/guillermo.reyes/SqueezeDataBase/db/Pfam-A.hmm

target sequence database: /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa

per-dom hits tabular output: /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/intermediate/05.Mar_ingreso_M1.pfam.hmm

sequence reporting threshold: E-value <= 1e-10

number of worker threads: 128

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query: 1-cysPrx_C [M=40] Accession: PF10417.12 Description: C-terminal domain of 1-Cys peroxiredoxin Parse failed (sequence file /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa): Premature EOF in parsing FASTA name/description line

fpusan commented 3 weeks ago

It would seem that the aminoacids file is truncated. I don't think I've seen this happen before. Can you share the /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa with us? Also can you share the /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/syslog file here?

jtamames commented 3 weeks ago

Hello It looks like there is something wrong with the predicted proteins' file 03.Mar_ingreso_M1.faa. Could you tell me the result of: tail -n 20 /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa