nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
192 stars 102 forks source link

Error in executing PROKKA #601

Open anugos opened 3 months ago

anugos commented 3 months ago

nf-core/mag v2.5.4-ge486bb2 Run Name: bq-mag19 nf-core/mag execution completed unsuccessfully! The exit status of the task that caused the workflow execution to fail was: 2.

The full error message was:

Error executing process > 'NFCORE_MAG:MAG:PROKKA (MEGAHIT-MetaBAT2-group-10.14)'

Caused by: Process NFCORE_MAG:MAG:PROKKA (MEGAHIT-MetaBAT2-group-10.14) terminated with an error exit status (2)

Command executed:

prokka \ --metagenome \ --cpus 2 \ --prefix MEGAHIT-MetaBAT2-group-10.14 \ \ \ MEGAHIT-MetaBAT2-group-10.14.fa

cat <<-END_VERSIONS > versions.yml "NFCORE_MAG:MAG:PROKKA": prokka: $(echo $(prokka --version 2>&1) | sed 's/^.*prokka //') END_VERSIONS

Command exit status: 2

Command output: (empty)

Command error: [21:24:50] Determined blastp version is 002012 from 'blastp: 2.12.0+' [21:24:50] Looking for 'cmpress' - found /usr/local/bin/cmpress [21:24:50] Determined cmpress version is 001001 from '# INFERNAL 1.1.4 (Dec 2020)' [21:24:50] Looking for 'cmscan' - found /usr/local/bin/cmscan [21:24:50] Determined cmscan version is 001001 from '# INFERNAL 1.1.4 (Dec 2020)' [21:24:50] Looking for 'egrep' - found /bin/egrep [21:24:50] Looking for 'find' - found /usr/bin/find [21:24:50] Looking for 'grep' - found /bin/grep [21:24:50] Looking for 'hmmpress' - found /usr/local/bin/hmmpress [21:24:50] Determined hmmpress version is 003003 from '# HMMER 3.3.2 (Nov 2020); http://hmmer.org/' [21:24:50] Looking for 'hmmscan' - found /usr/local/bin/hmmscan [21:24:50] Determined hmmscan version is 003003 from '# HMMER 3.3.2 (Nov 2020); http://hmmer.org/' [21:24:50] Looking for 'java' - found /usr/local/bin/java [21:24:50] Looking for 'makeblastdb' - found /usr/local/bin/makeblastdb [21:24:50] Determined makeblastdb version is 002012 from 'makeblastdb: 2.12.0+' [21:24:50] Looking for 'minced' - found /usr/local/bin/minced [21:24:50] Determined minced version is 004002 from 'minced 0.4.2' [21:24:50] Looking for 'parallel' - found /usr/local/bin/parallel [21:24:50] Determined parallel version is 20220222 from 'GNU parallel 20220222' [21:24:50] Looking for 'prodigal' - found /usr/local/bin/prodigal [21:24:50] Determined prodigal version is 002006 from 'Prodigal V2.6.3: February, 2016' [21:24:50] Looking for 'prokka-genbank_to_fasta_db' - found /usr/local/bin/prokka-genbank_to_fasta_db [21:24:50] Looking for 'sed' - found /bin/sed [21:24:50] Looking for 'tbl2asn' - found /usr/local/bin/tbl2asn [21:24:51] Determined tbl2asn version is 025007 from 'tbl2asn 25.7 arguments:' [21:24:51] Using genetic code table 11. [21:24:51] Loading and checking input file: MEGAHIT-MetaBAT2-group-10.14.fa [21:24:51] Wrote 65 contigs totalling 205963 bp. [21:24:51] Predicting tRNAs and tmRNAs [21:24:51] Running: aragorn -l -gc11 -w MEGAHIT-MetaBAT2-group-10.14\/MEGAHIT-MetaBAT2-group-10.14.fna [21:24:51] Found 0 tRNAs [21:24:51] Predicting Ribosomal RNAs [21:24:51] Running Barrnap with 2 threads [21:24:51] Found 0 rRNAs [21:24:51] Skipping ncRNA search, enable with --rfam if desired. [21:24:51] Total of 0 tRNA + rRNA features [21:24:51] Searching for CRISPR repeats [21:24:51] Found 0 CRISPRs [21:24:51] Predicting coding sequences [21:24:51] Contigs total 205963 bp, so using meta mode [21:24:51] Running: prodigal -i MEGAHIT-MetaBAT2-group-10.14\/MEGAHIT-MetaBAT2-group-10.14.fna -c -m -g 11 -p meta -f sco -q [21:24:52] Found 226 CDS [21:24:52] Connecting features back to sequences [21:24:52] Not using genus-specific database. Try --usegenus to enable it. [21:24:52] Annotating CDS, please be patient. [21:24:52] Will use 2 CPUs for similarity searching. [21:24:52] There are still 226 unannotated CDS left (started with 226) [21:24:52] Will use blast to search against /usr/local/db/kingdom/Bacteria/IS with 2 CPUs [21:24:52] Running: cat MEGAHIT-MetaBAT2-group-10.14\/MEGAHIT-MetaBAT2-group-10.14.IS.tmp.42.faa | parallel --gnu --plain -j 2 --block 14374 --recstart '>' --pipe blastp -query - -db /usr/local/db/kingdom/Bacteria/IS -evalue 1e-30 -qcov_hsp_perc 90 -num_threads 1 -num_descriptions 1 -num_alignments 1 -seg no > MEGAHIT-MetaBAT2-group-10.14\/MEGAHIT-MetaBAT2-group-10.14.IS.tmp.42.blast 2> /dev/null [21:24:53] Could not run command: cat MEGAHIT-MetaBAT2-group-10.14\/MEGAHIT-MetaBAT2-group-10.14.IS.tmp.42.faa | parallel --gnu --plain -j 2 --block 14374 --recstart '>' --pipe blastp -query - -db /usr/local/db/kingdom/Bacteria/IS -evalue 1e-30 -qcov_hsp_perc 90 -num_threads 1 -num_descriptions 1 -num_alignments 1 -seg no > MEGAHIT-MetaBAT2-group-10.14\/MEGAHIT-MetaBAT2-group-10.14.IS.tmp.42.blast 2> /dev/null

Work dir: /data/user/anugos24/Black-Queen-analysis/Shotgun-Metagenome/redo_results_2024/work/d8/6fda25e960ff9bd0d71920b903df93

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out The workflow was completed at 2024-03-12T21:29:39.822735-05:00 (duration: 5m 5s)

The command used to launch the workflow was as follows:

nextflow run nf-core/mag -r 2.5.4 -name bq-mag19 -profile singularity -params-file nf-params.json -c custom.config -resume bq-mag18 Pipeline Configuration: revision
2.5.4 runName bq-mag19 containerEngine singularity container
[PROKKA:https://depot.galaxyproject.org/singularity/prokka:1.14.6--pl5321hdfd78af_4]

profile singularity configFiles

phix_reference
/home/anugos24/.nextflow/assets/nf-core/mag/assets/data/GCA_002596845.1_ASM259684v1_genomic.fna.gz lambda_reference
/home/anugos24/.nextflow/assets/nf-core/mag/assets/data/GCA_000840245.1_ViralProj14204_genomic.fna.gz kraken2_db
/data/user/database/minikraken_8GB_202003.tgz skip_krona
true gtdbtk_min_perc_aa
10 gtdbtk_pplacer_cpus 1 coassemble_group
true megahit_options --presets meta-large skip_spades true skip_spadeshybrid
true skip_prodigal
true skip_metaeuk
true skip_maxbin2
true skip_concoct
true bowtie2_mode
--very-sensitive save_assembly_mapped_reads
true busco_db
/data/user/bacteria_odb10.2020-03-06.tar.gz busco_auto_lineage_prok true busco_clean true

Nextflow Version
23.10.1 Nextflow Build
5891 Nextflow Compile Timestamp
12-01-2024 22:01 UTC nf-core/mag

jfy133 commented 3 months ago

Hi @anugos This seems to be a common and 'unresolved' prokka error. The recommendation is posted here: https://github.com/tseemann/prokka/issues/402#issuecomment-1547340365

Please install PROKKA manually (e.g. via conda), cd into the work directory reported into the error, then use the command in the .command.sh file to re-run prokka, but without redirecting the stdout/in

roberta-davidson commented 3 months ago

Hey @anugos @jfy133 ! Found a bit of a workaround. I downloaded this container for Prokka and then modified my config to use this container. Also would not work via slurm submssion to our HPC, but did on the head node (?!), and then had to modify to run 1 at a time so tmp directories for prokka didn't overwrite eachother. On second thought, maybe using a different container was unnecessary but anyway.. Overall additions to config file:

process {
   executor = 'slurm'
   clusterOptions="-N 1 -p skylake,icelake"
  withName: PROKKA {
    container = '/<path>/prokka_1.14.6--pl5321hdfd78af_5.sif'
    executor = 'local'
    maxForks = 1
  }
}
jfy133 commented 3 months ago

@roberta-davidson huh interesting... what was the actual error for you (i.e., what was otehrwise piped to nothing?

Is it a /tmp clash or something? This we can maybe set to use a the process' specific work directory...

roberta-davidson commented 3 months ago

The original command in error from running mag was:

  [17:36:49] Could not run command: cat MEGAHIT\-MetaBAT2\-Bushfire_A24936\.15\/MEGAHIT\-MetaBAT2\-Bushfire_A24936\.15\.IS\.tmp\.44\.faa | parallel --gnu --plain -j 2 --block 43333 --recstart '>' --pipe blastp -query - -db /usr/local/db/kingdom/Bacteria/IS -evalue 1e-30 -qcov_hsp_perc 90 -num_threads 1 -num_descriptions 1 -num_alignments 1 -seg no > MEGAHIT\-MetaBAT2\-Bushfire_A24936\.15\/MEGAHIT\-MetaBAT2\-Bushfire_A24936\.15\.IS\.tmp\.44\.blast 2> /dev/null

I really don't understand why this workaround works... I set out to do as you suggest above, and wrote a script to run .command.sh in each work dir using my own prokka container, and then @shyama-mama figured out to just adjust the config and point to that container when the pipeline runs. Then realised that .command.sh with my container worked on head node but not within the pipeline (no idea why). Then adjusted to execute locally, and one at a time.