nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
208 stars 105 forks source link

gtdbtk classify_wf did not produce *.classify.tree output #454

Closed FranciscoDA closed 7 months ago

FranciscoDA commented 1 year ago

Description of the bug

Hello,

I've encountered a problem in the process defined at NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFY while processing some single-end ONT whole-genome shotgun long reads from gut microbiome.

I suspect there was an error in the GTDB-Tk classification where the expected output files were not written (no .classify.tree file in the process work dir). However, it seems that the log file from GTKDB-Tk did not show any errors so I'm not sure how to move forward with this issue.

It's also worth noting that the BUSCO analysis failed on 225 out of the 330 clusters because no genes could be found.

When running the workflow with the --skin_binqc argument, it will omit those process and complete successfully.

Any pointers on how to debug this issue will be greatly appreciated.

Thanks.

EDIT: The md5sum from the downloaded GTDB-Tk database matches the hash listed at the uq.edu.au site:

$ md5sum /data/databases/gtdbtk_r202_data.tar.gz
4986526c2b935fd4dcc2e604c0322517  /data/databases/gtdbtk_r202_data.tar.gz

Command used and terminal output

$ nextflow -c many-cpu.config run nf-core/mag -profile docker --input '/path/to/my.fastq.gz' --single_end --outdir 'magoutput2' --gtdb /data/databases/gtdbtk_r202_data.tar.gz

N E X T F L O W  ~  version 22.10.8
WARN: It appears you have never run this project before -- Option `-resume` is ignored
Launching `https://github.com/nf-core/mag` [zen_sanger] DSL2 - revision: c9468cb915 [master]

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/mag v2.3.0-gc9468cb
------------------------------------------------------

...

-[nf-core/mag] For 225 bin(s) the BUSCO analysis failed because no BUSCO genes could be found:
    MEGAHIT-CONCOCT-FutureBiome1_117.fa
    MEGAHIT-CONCOCT-FutureBiome1_104.fa
    MEGAHIT-CONCOCT-FutureBiome1_12.fa
    MEGAHIT-CONCOCT-FutureBiome1_1.fa
    MEGAHIT-CONCOCT-FutureBiome1_74.fa
    MEGAHIT-CONCOCT-FutureBiome1_320.fa
    MEGAHIT-CONCOCT-FutureBiome1_288.fa
    MEGAHIT-CONCOCT-FutureBiome1_246.fa
    MEGAHIT-CONCOCT-FutureBiome1_94.fa
    MEGAHIT-CONCOCT-FutureBiome1_34.fa
    MEGAHIT-CONCOCT-FutureBiome1_41.fa
    MEGAHIT-CONCOCT-FutureBiome1_111.fa
    MEGAHIT-CONCOCT-FutureBiome1_77.fa
    MEGAHIT-CONCOCT-FutureBiome1_101.fa
    MEGAHIT-CONCOCT-FutureBiome1_113.fa
    MEGAHIT-CONCOCT-FutureBiome1_109.fa
    MEGAHIT-CONCOCT-FutureBiome1_73.fa
    MEGAHIT-CONCOCT-FutureBiome1_124.fa
    MEGAHIT-CONCOCT-FutureBiome1_11.fa
    MEGAHIT-CONCOCT-FutureBiome1_106.fa
    MEGAHIT-CONCOCT-FutureBiome1_114.fa
    MEGAHIT-CONCOCT-FutureBiome1_120.fa
    MEGAHIT-CONCOCT-FutureBiome1_123.fa
    MEGAHIT-CONCOCT-FutureBiome1_121.fa
    MEGAHIT-CONCOCT-FutureBiome1_116.fa
    MEGAHIT-CONCOCT-FutureBiome1_110.fa
    MEGAHIT-CONCOCT-FutureBiome1_102.fa
    MEGAHIT-CONCOCT-FutureBiome1_115.fa
    MEGAHIT-CONCOCT-FutureBiome1_118.fa
    MEGAHIT-CONCOCT-FutureBiome1_161.fa
    MEGAHIT-CONCOCT-FutureBiome1_141.fa
    MEGAHIT-CONCOCT-FutureBiome1_128.fa
    MEGAHIT-CONCOCT-FutureBiome1_136.fa
    MEGAHIT-CONCOCT-FutureBiome1_0.fa
    MEGAHIT-CONCOCT-FutureBiome1_143.fa
    MEGAHIT-CONCOCT-FutureBiome1_129.fa
    MEGAHIT-CONCOCT-FutureBiome1_133.fa
    MEGAHIT-CONCOCT-FutureBiome1_151.fa
    MEGAHIT-CONCOCT-FutureBiome1_13.fa
    MEGAHIT-CONCOCT-FutureBiome1_155.fa
    MEGAHIT-CONCOCT-FutureBiome1_132.fa
    MEGAHIT-CONCOCT-FutureBiome1_148.fa
    MEGAHIT-CONCOCT-FutureBiome1_14.fa
    MEGAHIT-CONCOCT-FutureBiome1_145.fa
    MEGAHIT-CONCOCT-FutureBiome1_156.fa
    MEGAHIT-CONCOCT-FutureBiome1_140.fa
    MEGAHIT-CONCOCT-FutureBiome1_147.fa
    MEGAHIT-CONCOCT-FutureBiome1_171.fa
    MEGAHIT-CONCOCT-FutureBiome1_158.fa
    MEGAHIT-CONCOCT-FutureBiome1_164.fa
    MEGAHIT-CONCOCT-FutureBiome1_168.fa
    MEGAHIT-CONCOCT-FutureBiome1_16.fa
    MEGAHIT-CONCOCT-FutureBiome1_165.fa
    MEGAHIT-CONCOCT-FutureBiome1_186.fa
    MEGAHIT-CONCOCT-FutureBiome1_162.fa
    MEGAHIT-CONCOCT-FutureBiome1_47.fa
    MEGAHIT-CONCOCT-FutureBiome1_159.fa
    MEGAHIT-CONCOCT-FutureBiome1_166.fa
    MEGAHIT-CONCOCT-FutureBiome1_181.fa
    MEGAHIT-CONCOCT-FutureBiome1_137.fa
    MEGAHIT-CONCOCT-FutureBiome1_134.fa
    MEGAHIT-CONCOCT-FutureBiome1_139.fa
    MEGAHIT-CONCOCT-FutureBiome1_154.fa
    MEGAHIT-CONCOCT-FutureBiome1_160.fa
    MEGAHIT-CONCOCT-FutureBiome1_174.fa
    MEGAHIT-CONCOCT-FutureBiome1_191.fa
    MEGAHIT-CONCOCT-FutureBiome1_192.fa
    MEGAHIT-CONCOCT-FutureBiome1_205.fa
    MEGAHIT-CONCOCT-FutureBiome1_204.fa
    MEGAHIT-CONCOCT-FutureBiome1_20.fa
    MEGAHIT-CONCOCT-FutureBiome1_218.fa
    MEGAHIT-CONCOCT-FutureBiome1_212.fa
    MEGAHIT-CONCOCT-FutureBiome1_24.fa
    MEGAHIT-CONCOCT-FutureBiome1_193.fa
    MEGAHIT-CONCOCT-FutureBiome1_175.fa
    MEGAHIT-CONCOCT-FutureBiome1_183.fa
    MEGAHIT-CONCOCT-FutureBiome1_179.fa
    MEGAHIT-CONCOCT-FutureBiome1_201.fa
    MEGAHIT-CONCOCT-FutureBiome1_197.fa
    MEGAHIT-CONCOCT-FutureBiome1_198.fa
    MEGAHIT-CONCOCT-FutureBiome1_19.fa
    MEGAHIT-CONCOCT-FutureBiome1_217.fa
    MEGAHIT-CONCOCT-FutureBiome1_40.fa
    MEGAHIT-CONCOCT-FutureBiome1_227.fa
    MEGAHIT-CONCOCT-FutureBiome1_223.fa
    MEGAHIT-CONCOCT-FutureBiome1_208.fa
    MEGAHIT-CONCOCT-FutureBiome1_225.fa
    MEGAHIT-CONCOCT-FutureBiome1_185.fa
    MEGAHIT-CONCOCT-FutureBiome1_207.fa
    MEGAHIT-CONCOCT-FutureBiome1_239.fa
    MEGAHIT-CONCOCT-FutureBiome1_219.fa
    MEGAHIT-CONCOCT-FutureBiome1_21.fa
    MEGAHIT-CONCOCT-FutureBiome1_194.fa
    MEGAHIT-CONCOCT-FutureBiome1_163.fa
    MEGAHIT-CONCOCT-FutureBiome1_203.fa
    MEGAHIT-CONCOCT-FutureBiome1_177.fa
    MEGAHIT-CONCOCT-FutureBiome1_210.fa
    MEGAHIT-CONCOCT-FutureBiome1_230.fa
    MEGAHIT-CONCOCT-FutureBiome1_25.fa
    MEGAHIT-CONCOCT-FutureBiome1_23.fa
    MEGAHIT-CONCOCT-FutureBiome1_206.fa
    MEGAHIT-CONCOCT-FutureBiome1_215.fa
    MEGAHIT-CONCOCT-FutureBiome1_256.fa
    MEGAHIT-CONCOCT-FutureBiome1_190.fa
    MEGAHIT-CONCOCT-FutureBiome1_187.fa
    MEGAHIT-CONCOCT-FutureBiome1_220.fa
    MEGAHIT-CONCOCT-FutureBiome1_178.fa
    MEGAHIT-CONCOCT-FutureBiome1_249.fa
    MEGAHIT-CONCOCT-FutureBiome1_221.fa
    MEGAHIT-CONCOCT-FutureBiome1_213.fa
    MEGAHIT-CONCOCT-FutureBiome1_252.fa
    MEGAHIT-CONCOCT-FutureBiome1_250.fa
    MEGAHIT-CONCOCT-FutureBiome1_240.fa
    MEGAHIT-CONCOCT-FutureBiome1_258.fa
    MEGAHIT-CONCOCT-FutureBiome1_229.fa
    MEGAHIT-CONCOCT-FutureBiome1_236.fa
    MEGAHIT-CONCOCT-FutureBiome1_231.fa
    MEGAHIT-CONCOCT-FutureBiome1_228.fa
    MEGAHIT-CONCOCT-FutureBiome1_27.fa
    MEGAHIT-CONCOCT-FutureBiome1_235.fa
    MEGAHIT-CONCOCT-FutureBiome1_241.fa
    MEGAHIT-CONCOCT-FutureBiome1_244.fa
    MEGAHIT-CONCOCT-FutureBiome1_251.fa
    MEGAHIT-CONCOCT-FutureBiome1_253.fa
    MEGAHIT-CONCOCT-FutureBiome1_259.fa
    MEGAHIT-CONCOCT-FutureBiome1_254.fa
    MEGAHIT-CONCOCT-FutureBiome1_257.fa
    MEGAHIT-CONCOCT-FutureBiome1_49.fa
    MEGAHIT-CONCOCT-FutureBiome1_48.fa
    MEGAHIT-CONCOCT-FutureBiome1_26.fa
    MEGAHIT-CONCOCT-FutureBiome1_224.fa
    MEGAHIT-CONCOCT-FutureBiome1_72.fa
    MEGAHIT-CONCOCT-FutureBiome1_268.fa
    MEGAHIT-CONCOCT-FutureBiome1_262.fa
    MEGAHIT-CONCOCT-FutureBiome1_261.fa
    MEGAHIT-CONCOCT-FutureBiome1_266.fa
    MEGAHIT-CONCOCT-FutureBiome1_302.fa
    MEGAHIT-CONCOCT-FutureBiome1_294.fa
    MEGAHIT-CONCOCT-FutureBiome1_278.fa
    MEGAHIT-CONCOCT-FutureBiome1_311.fa
    MEGAHIT-CONCOCT-FutureBiome1_303.fa
    MEGAHIT-CONCOCT-FutureBiome1_310.fa
    MEGAHIT-CONCOCT-FutureBiome1_306.fa
    MEGAHIT-CONCOCT-FutureBiome1_281.fa
    MEGAHIT-CONCOCT-FutureBiome1_290.fa
    MEGAHIT-CONCOCT-FutureBiome1_287.fa
    MEGAHIT-CONCOCT-FutureBiome1_319.fa
    MEGAHIT-CONCOCT-FutureBiome1_318.fa
    MEGAHIT-CONCOCT-FutureBiome1_295.fa
    MEGAHIT-CONCOCT-FutureBiome1_31.fa
    MEGAHIT-CONCOCT-FutureBiome1_275.fa
    MEGAHIT-CONCOCT-FutureBiome1_279.fa
    MEGAHIT-CONCOCT-FutureBiome1_297.fa
    MEGAHIT-CONCOCT-FutureBiome1_270.fa
    MEGAHIT-CONCOCT-FutureBiome1_326.fa
    MEGAHIT-CONCOCT-FutureBiome1_323.fa
    MEGAHIT-CONCOCT-FutureBiome1_292.fa
    MEGAHIT-CONCOCT-FutureBiome1_314.fa
    MEGAHIT-CONCOCT-FutureBiome1_285.fa
    MEGAHIT-CONCOCT-FutureBiome1_325.fa
    MEGAHIT-CONCOCT-FutureBiome1_307.fa
    MEGAHIT-CONCOCT-FutureBiome1_30.fa
    MEGAHIT-CONCOCT-FutureBiome1_321.fa
    MEGAHIT-CONCOCT-FutureBiome1_293.fa
    MEGAHIT-CONCOCT-FutureBiome1_265.fa
    MEGAHIT-CONCOCT-FutureBiome1_327.fa
    MEGAHIT-CONCOCT-FutureBiome1_289.fa
    MEGAHIT-CONCOCT-FutureBiome1_328.fa
    MEGAHIT-CONCOCT-FutureBiome1_329.fa
    MEGAHIT-CONCOCT-FutureBiome1_305.fa
    MEGAHIT-CONCOCT-FutureBiome1_55.fa
    MEGAHIT-CONCOCT-FutureBiome1_44.fa
    MEGAHIT-CONCOCT-FutureBiome1_58.fa
    MEGAHIT-CONCOCT-FutureBiome1_59.fa
    MEGAHIT-CONCOCT-FutureBiome1_62.fa
    MEGAHIT-CONCOCT-FutureBiome1_284.fa
    MEGAHIT-CONCOCT-FutureBiome1_64.fa
    MEGAHIT-CONCOCT-FutureBiome1_309.fa
    MEGAHIT-CONCOCT-FutureBiome1_68.fa
    MEGAHIT-CONCOCT-FutureBiome1_76.fa
    MEGAHIT-CONCOCT-FutureBiome1_75.fa
    MEGAHIT-CONCOCT-FutureBiome1_69.fa
    MEGAHIT-CONCOCT-FutureBiome1_79.fa
    MEGAHIT-CONCOCT-FutureBiome1_6.fa
    MEGAHIT-CONCOCT-FutureBiome1_61.fa
    MEGAHIT-CONCOCT-FutureBiome1_39.fa
    MEGAHIT-CONCOCT-FutureBiome1_78.fa
    MEGAHIT-CONCOCT-FutureBiome1_286.fa
    MEGAHIT-CONCOCT-FutureBiome1_43.fa
    MEGAHIT-CONCOCT-FutureBiome1_67.fa
    MEGAHIT-CONCOCT-FutureBiome1_88.fa
    MEGAHIT-CONCOCT-FutureBiome1_54.fa
    MEGAHIT-CONCOCT-FutureBiome1_70.fa
    MEGAHIT-CONCOCT-FutureBiome1_80.fa
    MEGAHIT-CONCOCT-FutureBiome1_237.fa
    MEGAHIT-CONCOCT-FutureBiome1_85.fa
    MEGAHIT-CONCOCT-FutureBiome1_63.fa
    MEGAHIT-CONCOCT-FutureBiome1_82.fa
    MEGAHIT-CONCOCT-FutureBiome1_57.fa
    MEGAHIT-CONCOCT-FutureBiome1_29.fa
    MEGAHIT-CONCOCT-FutureBiome1_91.fa
    MEGAHIT-CONCOCT-FutureBiome1_71.fa
    MEGAHIT-CONCOCT-FutureBiome1_324.fa
    MEGAHIT-CONCOCT-FutureBiome1_274.fa
    MEGAHIT-CONCOCT-FutureBiome1_267.fa
    MEGAHIT-CONCOCT-FutureBiome1_322.fa
    MEGAHIT-CONCOCT-FutureBiome1_97.fa
    MEGAHIT-CONCOCT-FutureBiome1_300.fa
    MEGAHIT-CONCOCT-FutureBiome1_56.fa
    MEGAHIT-CONCOCT-FutureBiome1_66.fa
    MEGAHIT-CONCOCT-FutureBiome1_277.fa
    MEGAHIT-CONCOCT-FutureBiome1_36.fa
    MEGAHIT-CONCOCT-FutureBiome1_4.fa
    MEGAHIT-CONCOCT-FutureBiome1_83.fa
    MEGAHIT-CONCOCT-FutureBiome1_50.fa
    MEGAHIT-CONCOCT-FutureBiome1_273.fa
    MEGAHIT-CONCOCT-FutureBiome1_308.fa
    MEGAHIT-CONCOCT-FutureBiome1_46.fa
    MEGAHIT-CONCOCT-FutureBiome1_313.fa
    MEGAHIT-CONCOCT-FutureBiome1_42.fa
    MEGAHIT-CONCOCT-FutureBiome1_299.fa
    MEGAHIT-CONCOCT-FutureBiome1_99.fa
    MEGAHIT-CONCOCT-FutureBiome1_96.fa
    MEGAHIT-CONCOCT-FutureBiome1_98.fa
    MEGAHIT-CONCOCT-FutureBiome1_95.fa
See /home/franciscoda/magoutput2/GenomeBinning/QC/BUSCO/[bin]_busco.err and /home/franciscoda/magoutput2/GenomeBinning/QC/BUSCO/[bin]_busco.log for further information.-
-[nf-core/mag] Pipeline completed with errors-
Error executing process > 'NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFY (MEGAHIT-MaxBin2-FutureBiome1)'

Caused by:
  Process `NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFY (MEGAHIT-MaxBin2-FutureBiome1)` terminated with an error exit status (1)

Command executed:

  export GTDBTK_DATA_PATH="${PWD}/database"
  if [ --scratch_dir pplacer_tmp != "" ] ; then
      mkdir pplacer_tmp
  fi

  gtdbtk classify_wf --extension fa                     --genome_dir bins                     --prefix "gtdbtk.MEGAHIT-MaxBin2-FutureBiome1"                     --out_dir "${PWD}"                     --cpus 24                     --pplacer_cpus 1                     --scratch_dir pplacer_tmp                     --min_perc_aa 10                     --min_af 0.65

  gzip "gtdbtk.MEGAHIT-MaxBin2-FutureBiome1".*.classify.tree "gtdbtk.MEGAHIT-MaxBin2-FutureBiome1".*.msa.fasta
  mv gtdbtk.log "gtdbtk.MEGAHIT-MaxBin2-FutureBiome1.log"
  mv gtdbtk.warnings.log "gtdbtk.MEGAHIT-MaxBin2-FutureBiome1.warnings.log"

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFY":
      gtdbtk: $(gtdbtk --version | sed -n 1p | sed "s/gtdbtk: version //; s/ Copyright.*//")
  END_VERSIONS

Command exit status:
  1

Command output:
  [2023-06-12 12:04:49] INFO: GTDB-Tk v1.5.0
  [2023-06-12 12:04:49] INFO: gtdbtk classify_wf --extension fa --genome_dir bins --prefix gtdbtk.MEGAHIT-MaxBin2-FutureBiome1 --out_dir /data/gec/pufo/work/7d/ee6dc2b86de2ee1850af480373187f --cpus 24 --pplacer_cpus 1 --scratch_dir pplacer_tmp --min_perc_aa 10 --min_af 0.65
  [2023-06-12 12:04:49] INFO: Using GTDB-Tk reference data version r202: database
  [2023-06-12 12:04:49] INFO: Identifying markers in 1 genomes with 24 threads.
  [2023-06-12 12:04:49] TASK: Running Prodigal V2.6.3 to identify genes.
  [2023-06-12 12:05:44] INFO: Completed 1 genome in 54.51 seconds (54.51 seconds/genome).
  [2023-06-12 12:05:44] TASK: Identifying TIGRFAM protein families.
  [2023-06-12 12:05:52] INFO: Completed 1 genome in 7.93 seconds (7.93 seconds/genome).
  [2023-06-12 12:05:52] TASK: Identifying Pfam protein families.
  [2023-06-12 12:05:53] INFO: Completed 1 genome in 1.78 seconds (1.78 seconds/genome).
  [2023-06-12 12:05:53] INFO: Annotations done using HMMER 3.1b2 (February 2015).
  [2023-06-12 12:05:53] TASK: Summarising identified marker genes.
  [2023-06-12 12:05:54] INFO: Completed 1 genome in 0.28 seconds (3.60 genomes/second).
  [2023-06-12 12:05:54] INFO: Done.
  [2023-06-12 12:05:54] INFO: Aligning markers in 1 genomes with 24 CPUs.
  [2023-06-12 12:05:54] INFO: Processing 1 genomes identified as bacterial.
  [2023-06-12 12:06:03] INFO: Read concatenated alignment for 45,555 GTDB genomes.
  [2023-06-12 12:06:03] TASK: Generating concatenated alignment for each marker.
  [2023-06-12 12:06:06] INFO: Completed 1 genome in 0.16 seconds (6.16 genomes/second).
  [2023-06-12 12:06:07] TASK: Aligning 14 identified markers using hmmalign 3.1b2 (February 2015).
  [2023-06-12 12:06:10] INFO: Completed 14 markers in 0.07 seconds (201.19 markers/second).
  [2023-06-12 12:06:10] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask.
  [2023-06-12 12:08:41] INFO: Completed 45,556 sequences in 2.51 minutes (18,180.44 sequences/minute).
  [2023-06-12 12:08:41] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs.
  [2023-06-12 12:08:41] INFO: 1 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.
  [2023-06-12 12:08:41] INFO: Creating concatenated alignment for 45,555 bacterial GTDB and user genomes.
  [2023-06-12 12:08:41] INFO: All bacterial user genomes have been filtered out.
  [2023-06-12 12:08:42] INFO: Done.
  [2023-06-12 12:08:42] INFO: Done.

Command error:
  ==> Processed 44243/45556 sequences (97%) |██████████████▌| [321.03sequence/s, ETA 00:04]
  ==> Processed 44276/45556 sequences (97%) |██████████████▌| [321.07sequence/s, ETA 00:03]
  ==> Processed 44309/45556 sequences (97%) |██████████████▌| [321.08sequence/s, ETA 00:03]
  ==> Processed 44342/45556 sequences (97%) |██████████████▌| [320.95sequence/s, ETA 00:03]
  ==> Processed 44375/45556 sequences (97%) |██████████████▌| [318.77sequence/s, ETA 00:03]==> Processed 44407/45556 sequences (97%) |██████████████▌| [318.70sequence/s, ETA 00:03]
  ==> Processed 44440/45556 sequences (98%) |██████████████▋| [318.96sequence/s, ETA 00:03]
  ==> Processed 44473/45556 sequences (98%) |██████████████▋| [319.24sequence/s, ETA 00:03]
  ==> Processed 44505/45556 sequences (98%) |██████████████▋| [319.08sequence/s, ETA 00:03]
  ==> Processed 44537/45556 sequences (98%) |██████████████▋| [319.00sequence/s, ETA 00:03]
  ==> Processed 44569/45556 sequences (98%) |██████████████▋| [318.94sequence/s, ETA 00:03]
  ==> Processed 44602/45556 sequences (98%) |██████████████▋| [319.17sequence/s, ETA 00:02]
  ==> Processed 44635/45556 sequences (98%) |██████████████▋| [319.38sequence/s, ETA 00:02]
  ==> Processed 44668/45556 sequences (98%) |██████████████▋| [319.60sequence/s, ETA 00:02]
  ==> Processed 44701/45556 sequences (98%) |██████████████▋| [319.66sequence/s, ETA 00:02]
  ==> Processed 44734/45556 sequences (98%) |██████████████▋| [319.81sequence/s, ETA 00:02]
  ==> Processed 44767/45556 sequences (98%) |██████████████▋| [319.94sequence/s, ETA 00:02]
  ==> Processed 44800/45556 sequences (98%) |██████████████▊| [320.12sequence/s, ETA 00:02]
  ==> Processed 44833/45556 sequences (98%) |██████████████▊| [320.20sequence/s, ETA 00:02]
  ==> Processed 44866/45556 sequences (98%) |██████████████▊| [320.34sequence/s, ETA 00:02]
  ==> Processed 44899/45556 sequences (99%) |██████████████▊| [320.50sequence/s, ETA 00:02]
  ==> Processed 44932/45556 sequences (99%) |██████████████▊| [320.53sequence/s, ETA 00:01]
  ==> Processed 44965/45556 sequences (99%) |██████████████▊| [320.59sequence/s, ETA 00:01]
  ==> Processed 44998/45556 sequences (99%) |██████████████▊| [320.64sequence/s, ETA 00:01]
  ==> Processed 45031/45556 sequences (99%) |██████████████▊| [320.70sequence/s, ETA 00:01]
  ==> Processed 45064/45556 sequences (99%) |██████████████▊| [320.78sequence/s, ETA 00:01]
  ==> Processed 45097/45556 sequences (99%) |██████████████▊| [320.83sequence/s, ETA 00:01]
  ==> Processed 45130/45556 sequences (99%) |██████████████▊| [320.88sequence/s, ETA 00:01]
  ==> Processed 45163/45556 sequences (99%) |██████████████▊| [320.90sequence/s, ETA 00:01]
  ==> Processed 45196/45556 sequences (99%) |██████████████▉| [320.89sequence/s, ETA 00:01]
  ==> Processed 45229/45556 sequences (99%) |██████████████▉| [320.91sequence/s, ETA 00:01]
  ==> Processed 45262/45556 sequences (99%) |██████████████▉| [320.79sequence/s, ETA 00:00]
  ==> Processed 45295/45556 sequences (99%) |██████████████▉| [320.84sequence/s, ETA 00:00]
  ==> Processed 45328/45556 sequences (99%) |██████████████▉| [320.94sequence/s, ETA 00:00]
  ==> Processed 45361/45556 sequences (100%) |██████████████▉| [320.99sequence/s, ETA 00:00]
  ==> Processed 45394/45556 sequences (100%) |██████████████▉| [320.94sequence/s, ETA 00:00]
  ==> Processed 45427/45556 sequences (100%) |██████████████▉| [320.91sequence/s, ETA 00:00]
  ==> Processed 45460/45556 sequences (100%) |██████████████▉| [320.93sequence/s, ETA 00:00]
  ==> Processed 45493/45556 sequences (100%) |██████████████▉| [320.91sequence/s, ETA 00:00]
  ==> Processed 45526/45556 sequences (100%) |██████████████▉| [320.93sequence/s, ETA 00:00]

  [2023-06-12 12:06:10] INFO: Completed 14 markers in 0.07 seconds (201.19 markers/second).
  [2023-06-12 12:06:10] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask.
  [2023-06-12 12:08:41] INFO: Completed 45,556 sequences in 2.51 minutes (18,180.44 sequences/minute).
  [2023-06-12 12:08:41] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs.
  [2023-06-12 12:08:41] INFO: 1 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.
  [2023-06-12 12:08:41] INFO: Creating concatenated alignment for 45,555 bacterial GTDB and user genomes.
  [2023-06-12 12:08:41] INFO: All bacterial user genomes have been filtered out.
  [2023-06-12 12:08:42] INFO: Done.
  [2023-06-12 12:08:42] INFO: Done.
  gzip: gtdbtk.MEGAHIT-MaxBin2-FutureBiome1.*.classify.tree: No such file or directory

Relevant files

BUSCO process log files from each of the 330 nextflow work dirs: BUSCO.zip

GTKDBTK_CLASSIFY process log files from the nextflow work dir: GTDBTK_CLASSIFY.command.log gtdbtk.log gtdbtk.warnings.log

Custom config file used for this run: many-cpu.config.txt

System information

d4straub commented 1 year ago

Hi,

it seems to me that this is a problem of the assembly/binning.

All bacterial user genomes have been filtered out.

Seems to say that all your genomes/bins are removed in QC filtering, consequently not results are written. That seems to be backed up be the frequent failing BUSCO QC, because GTDB-Tk requires relatively good bins, and such good bins are selected based on BUSCO metrics. I'd guess that all of your bins do not qualify for GTDB-Tk (see also https://nf-co.re/mag/2.3.0/output#gtdb-tk which says nf-core/mag uses GTDB-Tk to classify binned genomes which satisfy certain quality criteria (i.e. completeness and contamination assessed with the BUSCO analysis).). That should be obvious from the file GenomeBinning/QC/busco_summary.tsv in the results folder.

jfy133 commented 7 months ago

I believe there is a filter now for this - only assemblies with sufficient quality will reach classify_wf - and we print a warning if none pass the completenees filters