nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
192 stars 102 forks source link

Error with classification #556

Closed feixiang1209 closed 6 months ago

feixiang1209 commented 6 months ago

Description of the bug

Could you please see below error that happened every time I re-run the pipeline. I was about to send the log file, but there are too many as I have repeated this run many times. Could you please advise the reason of this error?

Thanks

-[nf-core/mag] Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFYWF (MEGAHIT-MaxBin2-unclassified-unrefined-M-23-6561_3-022umB_QIA-UDI041-QIA-UDI041_L002_R)'

Caused by: Process NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFYWF (MEGAHIT-MaxBin2-unclassified-unrefined-M-23-6561_3-022umB_QIA-UDI041-QIA-UDI041_L002_R) terminated with an error exit status (1)

Command executed:

export GTDBTK_DATA_PATH="${PWD}/database" if [ --scratch_dir pplacer_tmp != "" ] ; then mkdir pplacer_tmp fi

gtdbtk classify_wf \ --extension fa \ --genome_dir bins \ --prefix "gtdbtk.MEGAHIT-MaxBin2-unclassified-unrefined-M-23-6561_3-022umB_QIA-UDI041-QIA-UDI041_L002_R" \ --out_dir "${PWD}" \ --cpus 10 \ --skip_ani_screen \ --scratch_dir pplacer_tmp \ --min_perc_aa 10 \ --min_af 0.65

mv classify/* .

mv identify/* .

mv align/* . mv gtdbtk.log "gtdbtk.MEGAHIT-MaxBin2-unclassified-unrefined-M-23-6561_3-022umB_QIA-UDI041-QIA-UDI041_L002_R.log"

mv gtdbtk.warnings.log "gtdbtk.MEGAHIT-MaxBin2-unclassified-unrefined-M-23-6561_3-022umB_QIA-UDI041-QIA-UDI041_L002_R.warnings.log"

find -name gtdbtk.MEGAHIT-MaxBin2-unclassified-unrefined-M-23-6561_3-022umB_QIA-UDI041-QIA-UDI041_L002_R.*.classify.tree | xargs -r gzip # do not fail if .tree is missing

cat <<-END_VERSIONS > versions.yml "NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFYWF": gtdbtk: $(echo $(gtdbtk --version -v 2>&1) | sed "s/gtdbtk: version //; s/ Copyright.*//") END_VERSIONS

Command exit status: 1

Command output: [2024-01-08 20:00:33] INFO: Creating concatenated alignment for 80,801 bacterial GTDB and user genomes. [2024-01-08 20:00:58] INFO: Creating concatenated alignment for 12 bacterial user genomes. [2024-01-08 20:00:58] INFO: Processing 2 genomes identified as archaeal. [2024-01-08 20:01:00] INFO: Read concatenated alignment for 4,416 GTDB genomes. [2024-01-08 20:01:01] TASK: Generating concatenated alignment for each marker. [2024-01-08 20:01:02] INFO: Completed 2 genomes in 0.04 seconds (46.29 genomes/second). [2024-01-08 20:01:03] TASK: Aligning 51 identified markers using hmmalign 3.3.2 (Nov 2020). [2024-01-08 20:01:07] INFO: Completed 51 markers in 2.65 seconds (19.22 markers/second). [2024-01-08 20:01:07] TASK: Masking columns of archaeal multiple sequence alignment using canonical mask. [2024-01-08 20:01:12] INFO: Completed 4,418 sequences in 5.24 seconds (843.51 sequences/second). [2024-01-08 20:01:12] INFO: Masked archaeal alignment from 13,540 to 10,135 AAs. [2024-01-08 20:01:12] INFO: 0 archaeal user genomes have amino acids in <10.0% of columns in filtered MSA. [2024-01-08 20:01:12] INFO: Creating concatenated alignment for 4,418 archaeal GTDB and user genomes. [2024-01-08 20:01:15] INFO: Creating concatenated alignment for 2 archaeal user genomes. [2024-01-08 20:01:15] INFO: Done. [2024-01-08 20:01:16] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance. [2024-01-08 20:01:16] TASK: Placing 2 archaeal genomes into reference tree with pplacer using 10 CPUs (be patient). [2024-01-08 20:01:16] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2024-01-08 20:07:11] INFO: Calculating RED values based on reference tree. [2024-01-08 20:07:12] TASK: Traversing tree to determine classification method. [2024-01-08 20:07:12] INFO: Completed 2 genomes in 0.00 seconds (7,884.03 genomes/second). [2024-01-08 20:07:12] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2024-01-08 20:07:13] INFO: Completed 8 comparisons in 0.97 seconds (8.23 comparisons/second). [2024-01-08 20:07:14] INFO: 0 genome(s) have been classified using FastANI and pplacer. [2024-01-08 20:07:14] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance. [2024-01-08 20:07:14] TASK: Placing 12 bacterial genomes into backbone reference tree with pplacer using 10 CPUs (be patient). [2024-01-08 20:07:14] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2024-01-08 20:10:08] INFO: Calculating RED values based on reference tree. [2024-01-08 20:10:09] INFO: 12 out of 12 have an class assignments. Those genomes will be reclassified. [2024-01-08 20:10:09] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance. [2024-01-08 20:10:09] TASK: Placing 10 bacterial genomes into class-level reference tree 7 (1/2) with pplacer using 10 CPUs (be patient). [2024-01-08 20:15:17] INFO: Calculating RED values based on reference tree. [2024-01-08 20:15:20] TASK: Traversing tree to determine classification method. [2024-01-08 20:15:20] INFO: Completed 10 genomes in 0.00 seconds (5,375.93 genomes/second). [2024-01-08 20:15:20] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2024-01-08 20:15:21] INFO: Completed 18 comparisons in 1.18 seconds (15.21 comparisons/second). [2024-01-08 20:15:22] INFO: 2 genome(s) have been classified using FastANI and pplacer. [2024-01-08 20:15:22] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance. [2024-01-08 20:15:22] TASK: Placing 2 bacterial genomes into class-level reference tree 6 (2/2) with pplacer using 10 CPUs (be patient). [2024-01-08 20:21:21] INFO: Calculating RED values based on reference tree. [2024-01-08 20:21:24] TASK: Traversing tree to determine classification method. [2024-01-08 20:21:24] INFO: Completed 2 genomes in 0.00 seconds (5,171.77 genomes/second). [2024-01-08 20:21:24] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2024-01-08 20:21:28] INFO: Completed 44 comparisons in 4.00 seconds (11.01 comparisons/second). [2024-01-08 20:21:28] INFO: 0 genome(s) have been classified using FastANI and pplacer. [2024-01-08 20:21:29] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode. [2024-01-08 20:21:29] INFO: Done. [2024-01-08 20:21:29] INFO: Removing intermediate files. [2024-01-08 20:21:29] INFO: Intermediate files removed. [2024-01-08 20:21:29] INFO: Done.

Command error:

Search for files and perform actions on them. First failed action stops processing of current file. Defaults: PATH is current directory, action is '-print'

    -L,-follow  Follow symlinks     -H          ...on command line only     -xdev       Don't descend directories on other filesystems     -maxdepth N Descend at most N levels. -maxdepth 0 applies                 actions to command line arguments only     -mindepth N Don't act on first N levels     -depth            Act on directory after traversing it

Actions:     ( ACTIONS ) Group actions for -o / -a     ! ACT       Invert ACT's success/failure     ACT1 [-a] ACT2    If ACT1 fails, stop, else do ACT2     ACT1 -o ACT2      If ACT1 succeeds, stop, else do ACT2                 Note: -a has higher priority than -o     -name PATTERN     Match file name (w/o directory name) to PATTERN     -iname PATTERN    Case insensitive -name     -path PATTERN     Match path to PATTERN     -ipath PATTERN    Case insensitive -path     -regex PATTERN    Match path to regex PATTERN     -type X           File type is X (one of: f,d,l,b,c,s,p)     -executable File is executable     -perm MASK  At least one mask bit (+MASK), all bits (-MASK),                 or exactly MASK bits are set in file's mode     -mtime DAYS mtime is greater than (+N), less than (-N),                 or exactly N days in the past     -mmin MINS  mtime is greater than (+N), less than (-N),                 or exactly N minutes in the past     -newer FILE mtime is more recent than FILE's     -inum N           File has inode number N     -user NAME/ID     File is owned by given user     -group NAME/ID    File is owned by given group     -size N[bck]      File size is N (c:bytes,k:kbytes,b:512 bytes(def.))                 +/-N: file size is bigger/smaller than N     -links N    Number of links is greater than (+N), less than (-N),                 or exactly N     -empty            Match empty file/directory     -prune            If current file is directory, don't descend into it If none of the following actions is specified, -print is assumed     -print            Print file name     -print0           Print file name, NUL terminated     -exec CMD ARG ;   Run CMD with all instances of {} replaced by                 file name. Fails if CMD exits with nonzero     -exec CMD ARG + Run CMD with {} replaced by list of file names     -delete           Delete current file/directory. Turns on -depth option     -quit       Exit

Work dir: /home/diazrur/Documents/MAG_test/work/19/a837a93fc87b6d7b434ba4fb75476e

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details


From: Ruben Diaz Rua [ruben.diazrua@kaust.edu.sa](mailto:ruben.diazrua@kaust.edu.sa) Sent: Tuesday, January 9, 2024 9:43 AM To: Xiang Zhao [xiang.zhao@kaust.edu.sa](mailto:xiang.zhao@kaust.edu.sa) Subject: Re: error

nextflow run nf-core/mag --input "/home/diazrur/Documents/MAG_test/M-23-6561_3-022umB_QIA-UDI041-QIA-UDI041_L002_R{1,2}.fastq.gz" --outdir output -profile docker --skip_krona TRUE --cat_db /home/diazrur/Documents/metagenomics_DB/CAT_prepare_20210107.tar.gz --gtdb_db /home/diazrur/Documents//metagenomics_DB/gtdbtk_r214_data.tar.gz --binqc_tool checkm --skip_spades True --skip_spadeshybrid True --skip_concoct True --ancient_dna False --skip_metaeuk True --refine_bins_dastool True --run_gunc True --gtdbtk_pplacer_cpus 40 -c custom.txt -resume


From: Xiang Zhao [xiang.zhao@kaust.edu.sa](mailto:xiang.zhao@kaust.edu.sa) Sent: Tuesday, January 9, 2024 9:39 AM To: Ruben Diaz Rua [ruben.diazrua@kaust.edu.sa](mailto:ruben.diazrua@kaust.edu.sa) Subject: RE: error

From: Ruben Diaz Rua [ruben.diazrua@kaust.edu.sa](mailto:ruben.diazrua@kaust.edu.sa) Sent: Thursday, January 4, 2024 10:07 AM To: Xiang Zhao [xiang.zhao@kaust.edu.sa](mailto:xiang.zhao@kaust.edu.sa) Subject: error


                                    ,--./,-.
    ___     __   __   __   ___     /,-._.--~'

|\ | | / / \ |__) |__ } { | \| | \__, \__/ | \ |___ \-.,--, .,._,' nf-core/mag v2.5.1-ge728900

Core Nextflow options revision : master runName : curious_bell containerEngine : docker launchDir : /home/diazrur/Documents/Aramco_metagenomes workDir : /home/diazrur/Documents/Aramco_metagenomes/work projectDir : /home/diazrur/.nextflow/assets/nf-core/mag userName : diazrur profile : docker configFiles :

Input/output options input : metadata.csv outdir : output

Quality control for short reads options phix_reference : /home/diazrur/.nextflow/assets/nf-core/mag/assets/data/GCA_002596845.1_ASM259684v1_genomic.fna.gz

Quality control for long reads options lambda_reference : /home/diazrur/.nextflow/assets/nf-core/mag/assets/data/GCA_000840245.1_ViralProj14204_genomic.fna.gz

Taxonomic profiling options gtdbtk_min_perc_aa : 10 gtdbtk_pplacer_cpus: 40

Assembly options skip_spades : true skip_spadeshybrid : true

Gene prediction and annotation options skip_metaeuk : true

Binning options skip_concoct : true

Bin quality check options refine_bins_dastool: true run_gunc : true

###################

-- Check '.nextflow.log' file for details (env_nf) diazrur@KW60867:~/Documents/Aramco_metagenomes$ vi custom.conf (env_nf) diazrur@KW60867:~/Documents/Aramco_metagenomes$ nextflow run nf-core/mag --input metadata.csv --outdir output -profile docker --skip_spades True --skip_spadeshybrid True --skip_concoct True --ancient_dna False --skip_metaeuk True --refine_bins_dastool True --run_gunc True --gtdbtk_pplacer_cpus 40 -c custom.conf -resume N E X T F L O W ~ version 23.10.0 Launching https://github.com/nf-core/mag [curious_bell] DSL2 - revision: e72890089a [master]

Command used and terminal output

No response

Relevant files

No response

System information

No response

muniheart commented 6 months ago

This is the same issue as https://github.com/nf-core/mag/issues/547 Note the '*' in the path passed to find command.

jfy133 commented 6 months ago

Thanks @muniheart , I will close this in favour of your issue (as the older one) - will address it in the next couple of weeks as I'm now back from parental leave