nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
212 stars 109 forks source link

Error executing process > 'NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION (gtdbtk_r202_data.tar.gz)' #424

Closed johnne closed 1 year ago

johnne commented 1 year ago

Description of the bug

I'm running into an error when running nf-core/mag at prepping the GTDB-TK database. My config file contains the following:

params {

    input = "/proj/snic2020-5-486/nobackup/nbis-proj-6668/data/sample_list.test.csv"
    host_genome = "GRCm38"
    save_hostremoved_reads = true
    bowtie2_mode = "--very-sensitive"
    // cat_db_generate = true
    // cat_official_taxonomy = true
    // save_cat_db = true
    binqc_tool = "checkm"
    save_checkm_data = true
    megahit_fix_cpu_1 = true
    spades_fix_cpus = 10
    outdir = "/proj/snic2020-5-486/nobackup/nbis-proj-6668/mag.test/"
}

Command used and terminal output

nextflow -c conf/mag.test.config run nf-core/mag -r 2.3.0 -profile uppmax --project snic2022-5-350 

Error executing process > 'NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION (gtdbtk_r202_data.tar.gz)'                                                                                                          

Caused by:                                                                                                                                                                                                 
  Process `NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION (gtdbtk_r202_data.tar.gz)` terminated with an error exit status (2)                                                                                 

Command executed:                                                                                                                                                                                          

  mkdir database                                                                                                                                                                                           
  tar -xzf gtdbtk_r202_data.tar.gz -C database --strip 1                                                                                                                                                   

  cat <<-END_VERSIONS > versions.yml                                                                                                                                                                       
  "NFCORE_MAG:MAG:GTDBTK:GTDBTK_DB_PREPARATION":                                                                                                                                                           
      tar: $(tar --version 2>&1 | sed -n 1p | sed 's/tar (GNU tar) //')                                                                                                                                    
  END_VERSIONS                                                                                                                                                                                             

Command exit status:                                                                                 
  2                                                                                                                                                                                                        

Command output:                                                                                                                                                                                            
  (empty)                                                                                                                                                                                                  

Command error:                                                                                                                                                                                             
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred                                                                                                         
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred                                                                                                   
  INFO:    Environment variable SINGULARITYENV_SNIC_TMP is set, but APPTAINERENV_SNIC_TMP is preferred                                                                                                     

  gzip: stdin: unexpected end of file                                                                                                                                                                      
  tar: Unexpected EOF in archive                                                                                                                                                                           
  tar: Unexpected EOF in archive                                                                                                                                                                           
  tar: Error is not recoverable: exiting now                                                                                                                                                               

Work dir:                                                                                                                                                                                                  
  /crex/proj/snic2020-5-486/nobackup/nbis-proj-6668/work/c8/03ed00f1022e767453779b6dd1da78                                                                                                                 

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Relevant files

nextflow.log

System information

d4straub commented 1 year ago

You may never use a config files with -c defining params (as you did here), because those params can be overwritten. Use -params-file instead, this equals using params on the command line, i.e. guarantees that the settings are used by the pipeline. Here is how to do that: https://nf-co.re/ampliseq/2.5.0/usage#setting-parameters-in-a-file

d4straub commented 1 year ago

Having said that, this isnt the issue. The issue is:

  gzip: stdin: unexpected end of file                                                                                                                                                                      
  tar: Unexpected EOF in archive                                                                                                                                                                           
  tar: Unexpected EOF in archive                                                                                                                                                                           
  tar: Error is not recoverable: exiting now 

which is not a bug but an incomplete download. Please re-download, auto is fine but manually might be more reliable.

johnne commented 1 year ago

Ok thanks, I don't run nextflow often enough to remember these things and always struggle to find documentation on how to define parameters without passing on the commandline.

alneberg commented 1 year ago

I just ran into the same "issue". In my case, I specified an already extracted archive of the GTDB database which was present on our analysis cluster. It makes sense then that tar/gzip complains when attempting to extract it once again.

It doesn't seem to be what you're doing though since you're not specifying anything for the gtdb database parameter?

jfy133 commented 1 year ago

To address @alneberg 's problem I'll also add functionality to support directory input of the database :) (then we can close this issue)

jfy133 commented 1 year ago

Will be supported in https://github.com/nf-core/mag/pull/436

jfy133 commented 1 year ago

I guess this shoiuld also apply to BUSCO database (i.e., make BUSCO_DB_PREPARATION optional)

jfy133 commented 1 year ago

I've added gtdb, not yet done for busco as more complicated.

I will close this and make a new issue for busco