jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
348 stars 81 forks source link

04.rundiamond.pl. Program finished abnormally : taxaError: Invalid parameter count for option '--out' #696

Closed fconstancias closed 1 year ago

fconstancias commented 1 year ago

Dear SQM developper,

I am using SqueezeMeta v1.6.0, September 2022 install through conda and conda pack and got this error running the pipeline.

I initially thought it was due to the memory and played with the -b parameter but then realised there is a more specific error message: Invalid parameter count for option '--out'

These are the commands I used:

#!/bin/bash
#SBATCH --job-name=SQM_f1r        
#Name of the job   
#SBATCH -n 10             
#SBATCH --mem-per-cpu=10G
#SBATCH --time=24:00:00
#SBATCH --output SQM_f1r.log

SAMPLE_LIST=/cluster/scratch/fconstan/AS/SQM_sample_list_ferm1.tsv
INPUT=/cluster/scratch/fconstan/AS/01_QC/
OUTPUT=/cluster/scratch/fconstan/AS/SQM/
EXT_ASSEMBLY=/cluster/scratch/fconstan/AS/02_ASSEMBLY/Ferm1/final.contigs.fa

mkdir -p ${OUTPUT}

source /cluster/home/fconstan/SqueezeMeta/bin/activate
conda-unpack

SqueezeMeta.pl -m coassembly -contiglen 750  \
-s ${SAMPLE_LIST} -extassembly ${EXT_ASSEMBLY} \
-f ${INPUT} \
-b 8 -t 10 -p ${OUTPUT}/ferm1

I tried to run the command outside of the pipline which gave the same error:

#!/bin/bash
#SBATCH --job-name=SQM_diam_test.sh        
#Name of the job   
#SBATCH -n 1
#SBATCH --mem-per-cpu=1
#SBATCH --time=04:00:00
#SBATCH --output SQM_diam_test.log

source /cluster/home/fconstan/SqueezeMeta/bin/activate
conda-unpack

/cluster/home/fconstan/SqueezeMeta/SqueezeMeta/bin/diamond blastp -q /cluster/scratch/fconstan/AS/SQM/ferm1/results/03.ferm1.faa -p 1 -d /cluster/work/gdc/people/fconstan/db/SQM/db/nr.dmnd -e 0.001 --id 40 -f tab -b 1 -o /cluster/scratch/fconstan/AS/SQM/ferm1/intermediate/04.ferm1.nr.diamond 2>&1 /cluster/scratch/fconstan/AS/SQM/ferm1/temp/diamond.nr.log

syslog.zip and below the output of test_install.pl

test_install.pl 

Checking the OS
    linux OK

Checking that tree is installed
    tree --help OK

Checking that ruby is installed
    ruby -h OK

Checking that java is installed
    java -h OK

Checking that all the required perl libraries are available in this environment
    perl -e 'use Term::ANSIColor' OK
    perl -e 'use DBI' OK
    perl -e 'use DBD::SQLite::Constants' OK
    perl -e 'use Time::Seconds' OK
    perl -e 'use Tie::IxHash' OK
    perl -e 'use Linux::MemInfo' OK
    perl -e 'use Getopt::Long' OK
    perl -e 'use File::Basename' OK
    perl -e 'use DBD::SQLite' OK
    perl -e 'use Data::Dumper' OK
    perl -e 'use Cwd' OK
    perl -e 'use XML::LibXML' OK
    perl -e 'use XML::Parser' OK
    perl -e 'use Term::ANSIColor' OK

Checking that all the required python libraries are available in this environment
    python3 -h OK
    python3 -c 'import numpy' OK
    python3 -c 'import scipy' OK
    python3 -c 'import matplotlib' OK
    python3 -c 'import dendropy' OK
    python3 -c 'import pysam' OK
    python3 -c 'import Bio.Seq' OK
    python3 -c 'import pandas' OK
    python3 -c 'import sklearn' OK
    python3 -c 'import nose' OK
    python3 -c 'import cython' OK
    python3 -c 'import future' OK

Checking that all the required R libraries are available in this environment
    R -h OK
    R -e 'library(doMC)' OK
    R -e 'library(ggplot2)' OK
    R -e 'library(data.table)' OK
    R -e 'library(reshape2)' OK
    R -e 'library(pathview)' OK
    R -e 'library(DASTool)' OK
    R -e 'library(SQMtools)' OK

Checking that SqueezeMeta is properly configured... checking database in /cluster/work/gdc/people/fconstan/db/SQM/db
    nr.db OK
    CheckM manifest OK
    LCA_tax DB OK

All checks successful

(SqueezeMeta) -bash-4.2$ 
(SqueezeMeta) -bash-4.2$ test_install.pl 

Checking the OS
    linux OK

Checking that tree is installed
    tree --help OK

Checking that ruby is installed
    ruby -h OK

Checking that java is installed
    java -h OK

Checking that all the required perl libraries are available in this environment
    perl -e 'use Term::ANSIColor' OK
    perl -e 'use DBI' OK
    perl -e 'use DBD::SQLite::Constants' OK
    perl -e 'use Time::Seconds' OK
    perl -e 'use Tie::IxHash' OK
    perl -e 'use Linux::MemInfo' OK
    perl -e 'use Getopt::Long' OK
    perl -e 'use File::Basename' OK
    perl -e 'use DBD::SQLite' OK
    perl -e 'use Data::Dumper' OK
    perl -e 'use Cwd' OK
    perl -e 'use XML::LibXML' OK
    perl -e 'use XML::Parser' OK
    perl -e 'use Term::ANSIColor' OK

Checking that all the required python libraries are available in this environment
    python3 -h OK
    python3 -c 'import numpy' OK
    python3 -c 'import scipy' OK
    python3 -c 'import matplotlib' OK
    python3 -c 'import dendropy' OK
    python3 -c 'import pysam' OK
    python3 -c 'import Bio.Seq' OK
    python3 -c 'import pandas' OK
    python3 -c 'import sklearn' OK
    python3 -c 'import nose' OK
    python3 -c 'import cython' OK
    python3 -c 'import future' OK

Checking that all the required R libraries are available in this environment
    R -h OK
    R -e 'library(doMC)' OK
    R -e 'library(ggplot2)' OK
    R -e 'library(data.table)' OK
    R -e 'library(reshape2)' OK
    R -e 'library(pathview)' OK
    R -e 'library(DASTool)' OK
    R -e 'library(SQMtools)' OK

Checking that SqueezeMeta is properly configured... checking database in /cluster/work/gdc/people/fconstan/db/SQM/db
    nr.db OK
    CheckM manifest OK
    LCA_tax DB OK

All checks successful
fconstancias commented 1 year ago

If I omit the save the log as diamond.nr.log, everything seems to work fine.

(SqueezeMeta) $ /cluster/home/fconstan/SqueezeMeta/SqueezeMeta/bin/diamond blastp -q /cluster/scratch/fconstan/AS/SQM/ferm1/results/03.ferm1.faa -p 1 -d /cluster/work/gdc/people/fconstan/db/SQM/db/nr.dmnd -e 0.001 --id 40 -f tab -b 1 -o /cluster/scratch/fconstan/AS/SQM/ferm1/intermediate/04.ferm1.nr.diamond 2>&1 /cluster/scratch/fconstan/AS/SQM/ferm1/temp/diamond.nr.log
Error: Invalid parameter count for option '--out'
(SqueezeMeta)$ /cluster/home/fconstan/SqueezeMeta/SqueezeMeta/bin/diamond blastp -q /cluster/scratch/fconstan/AS/SQM/ferm1/results/03.ferm1.faa -p 1 -d /cluster/work/gdc/people/fconstan/db/SQM/db/nr.dmnd -e 0.001 --id 40 -f tab -b 1 -o /cluster/scratch/fconstan/AS/SQM/ferm1/intermediate/04.ferm1.nr.diamond 
diamond v2.0.15.153 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /cluster/scratch/fconstan/AS/SQM/ferm1/intermediate
#Target sequences to report alignments for: 25
Opening the database...  [0.059s]
Database: /cluster/work/gdc/people/fconstan/db/SQM/db/nr.dmnd (type: Diamond database, sequences: 493010185, letters: 190998596844)
Block size = 1000000000
Opening the input file...  [0.028s]
Opening the output file...  [0.001s]
Loading query sequences...  [0.561s]

Any idea what is happening here? Thanks a lot.

fpusan commented 1 year ago

What shell are you using? Is it bash?

fconstancias commented 1 year ago

I think I mess up my .~/bash_profile on the cluster ... :-/. It is not related to SQM but not sure how to solved that !

fconstancias commented 1 year ago

well, it is bash:

bash-4.2$ ps -p $$
  PID TTY          TIME CMD
19254 pts/2    00:00:00 bash
fpusan commented 1 year ago

You can edit script 04 and omit it there for now, so that the pipeline keeps working, but I will investigate this Just in case, can you confirm that bash is also the shell being used inside the computing node? In any case I should find a shell agnostic solution for this, but want to pinpoint the root of the issue

fconstancias commented 1 year ago

| can you confirm that bash is also the shell being used inside the computing node?

I am not sure how to check that. The version I am using is pre this commit and I noticed that way to redirect the log that way does not work:


for SAMPLE in ` awk  ${SAMPLELIST}  '{print $1}' `; do ls -lh ${INPUT}${SAMPLE}_R1_trimmed.fastq.gz  2>&1 ${SAMPLE}_test.log ; done
ls: cannot access /cluster/scratch/fconstan/AS/01_QC/sample_name_R1_trimmed.fastq.gz: No such file or directory
-rw-r----- 1 fconstan fconstan-group 112 Jun  9 15:26 sample_name_test.log
ls: cannot access Al001_test.log: No such file or directory
-rw-r----- 1 fconstan fconstan-group 1.3G Jun  1 21:59 /cluster/scratch/fconstan/AS/01_QC/Al001_R1_trimmed.fastq.gz
ls: cannot access Al002_test.log: No such file or directory

While this does generate the log files as expected:

for SAMPLE in ` awk ${SAMPLELIST} '{print $1}' `; do ls -lh ${INPUT}${SAMPLE}_R1_trimmed.fastq.gz > ${SAMPLE}_test.log 2>&1 ; done

I have updated to the latest version of SqueezeMeta and I did not encounter the issue anymore.