nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
217 stars 110 forks source link

CAT_DB process has harcoded CAT subdirectory names #611

Open maxibor opened 7 months ago

maxibor commented 7 months ago

Description of the bug

In the CAT_DB process, the subdirectories names are hardcoded (to database and taxonomy), which is problematic because the newer versions of the CAT database these directories renames to db and tax. Furthermore, the symlinking of these subdirectories in the process might be posing an issue when running using singularity.

ERROR ~ Error executing process > 'NFCORE_MAG:MAG:CAT_DB (20231120_CAT_nr)'

Caused by:
  Missing output file(s) `database/*` expected by process `NFCORE_MAG:MAG:CAT_DB (20231120_CAT_nr)`

Command executed:

  if [[ 20231120_CAT_nr != *.tar.gz ]]; then
      ln -sr `find 20231120_CAT_nr/ -type d -name "*taxonomy*"` taxonomy
      ln -sr `find 20231120_CAT_nr/ -type d -name "*database*"` database
  else
      mkdir catDB
      tar -xf 20231120_CAT_nr -C catDB
      mv `find catDB/ -type d -name "*taxonomy*"` taxonomy/
      mv `find catDB/ -type d -name "*database*"` database/
  fi

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:CAT_DB":
      tar: $(tar --version 2>&1 | sed -n 1p | sed 's/tar (GNU tar) //')
  END_VERSIONS

Command exit status:
  0

Command output:
  (empty)

Work dir:
  /home/lucia_winkler/nf-temp/26/59a97e2b1eb9ced31d84c2abe2d7d9

Command used and terminal output

nextflow run nf-core/mag -r 2.5.4 \
    -profile eva,archgen \
    --input /home/lucia_winkler/speleothem/pilot_sequences/2024-04-16_samplesheet.csv \
    --outdir results \
    --reads_minlength 30 \
    --bbnorm \
    --igenomes_base "/home/maxime_borry/SDAG_old/04_genomes/" \
    --host_genome GRCh38 \
    --skip_spades \
    --refine_bins_dastool \
    --ancient_dna \
    --skip_prokka \
    --binning_map_mode own \
    --busco_db "/r1/people/maxime_borry/02_db/busco_downloads" \
    --run_gunc \
    --gunc_db /r1/people/maxime_borry/02_db/gunc/gunc_db_progenomes2.1.dmnd \
    --postbinning_input both \
    --gtdb_db /home/maxime_borry/02_db/gtdb/r207/gtdbtk_r207_v2_data.tar.gz \
    --cat_db "/home/maxime_borry/02_db/cat/20231120_CAT_nr" \
    -resume \
    -with-tower

Relevant files

No response

System information

No response

### Tasks
- [ ] https://github.com/nf-core/modules/issues/5588
- [ ] https://github.com/nf-core/modules/issues/5586
- [ ] https://github.com/nf-core/modules/issues/5587
jfy133 commented 7 months ago

Agree, that mdoule is very old and rather fragile

We should entirely replace CAT modules with official ones, and I think from: https://github.com/MGXlab/CAT_pack

Which looks MUCH better (although not yet on bioconda), as it also describves hwo to make custom databses etc.