nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
218 stars 111 forks source link

run won't resume since kraken2 database file path changed #41

Closed ivelsko closed 4 years ago

ivelsko commented 4 years ago

Hi, I ran into the same error as issue #32 (Metabat fails when running with multiple input files) and let it sit for a few days before trying to resume the run. During that time the Kraken2 developers released a new database, and moved the one they had into a new folder.

I tried to resume my run to see if it would move past the metabat error, but it gave an error b/c it couldn't find the database file. So I tried to resume the run using the new database path, but that also gave an error. I've pasted both errors below. Is there a way around this, or will I need to start a new run?

Thanks, Irina

This is the first error about the database (with the original database file path):


u.edu/pub/data/kraken2_dbs/minikraken2_v2_8GB_201904_UPDATE.tgz'  --outdir '/projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output' -w '/projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output/work' -resume cmc_assembly
N E X T F L O W  ~  version 19.04.0
Launching `nf-core/mag` [tiny_ekeblad] - revision: 4c2f61cbbb [master]
WARN: It appears you have never run this project before -- Option `-resume` is ignored
WARN: Access to undefined parameter `readPaths` -- Initialise it to a default value eg. `params.readPaths = some_value`
WARN: Access to undefined parameter `fasta` -- Initialise it to a default value eg. `params.fasta = some_value`
Pipeline Release  : master
Run Name          : tiny_ekeblad
Reads             : /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/input/*.R{1,2}.fastq.gz
Fasta Ref         : null
Data Type         : Paired-End
Kraken2 Db        : ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken2_v2_8GB_201904_UPDATE.tgz
Busco Reference   : https://busco-archive.ezlab.org/v3/datasets/bacteria_odb9.tar.gz
Max Resources     : 256 GB memory, 32 cpus, 24d 20h 31m 24s time per job
Container         : singularity - nfcore/mag:1.0.0
Output dir        : /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output
Launch dir        : /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/aadder/output/Nov2018acc
Working dir       : /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output/work
Script dir        : /projects1/clusterhomes/velsko/.nextflow/assets/nf-core/mag
User              : velsko
Config Profile    : shh
Config Description: Generic MPI-SHH cluster(s) profile provided by nf-core/configs.
Config Contact    : James Fellows Yates (@jfy133), Maxime Borry (@Maxibor)
Config URL        : https://shh.mpg.de
executor >  slurm (77)                                                                                                                                                                                                                                                 [59/6648]
executor >  slurm (77)
executor >  slurm (77)
executor >  slurm (77)
executor >  slurm (77)
executor >  slurm (77)
[64/f2cbd8] process > phix_download_db       [100%] 1 of 1 ✔
[7a/3ac69f] process > fastp                  [100%] 39 of 39
[de/73cec0] process > fastqc_raw             [100%] 36 of 36
[f0/82d032] process > get_software_versions  [100%] 1 of 1 ✔
[8a/3b3835] process > kraken2_db_preparation [  0%] 1 of 0, failed: 1
ERROR ~ Error executing process > 'kraken2_db_preparation (1)'

Caused by:
  Can't stage file ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken2_v2_8GB_201904_UPDATE.tgz -- reason: pub/data/kraken2_dbs/minikraken2_v2_8GB_201904_UPDATE.tgz

Source block:
  """
  tar -xf "${db}"
  """

Work dir:
  /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output/work/8a/3b383522700f7823f76790e4b8bccd

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details```

and this is the second error about the database (with the current path to the database file):
```$ nextflow run nf-core/mag --reads '/projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/input/*.R{1,2}.fastq.gz' -profile shh --kraken2_db 'ftp://ftp.ccb.jh
u.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz'  --outdir '/projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output' -w '/projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output/work' -resume cmc_assembly
N E X T F L O W  ~  version 19.04.0
Launching `nf-core/mag` [reverent_dubinsky] - revision: 4c2f61cbbb [master]
WARN: Access to undefined parameter `readPaths` -- Initialise it to a default value eg. `params.readPaths = some_value`
WARN: Access to undefined parameter `fasta` -- Initialise it to a default value eg. `params.fasta = some_value`
Pipeline Release  : master
Run Name          : reverent_dubinsky
Reads             : /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/input/*.R{1,2}.fastq.gz
Fasta Ref         : null
Data Type         : Paired-End
Kraken2 Db        : ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz
Busco Reference   : https://busco-archive.ezlab.org/v3/datasets/bacteria_odb9.tar.gz
Max Resources     : 256 GB memory, 32 cpus, 24d 20h 31m 24s time per job
Container         : singularity - nfcore/mag:1.0.0
Output dir        : /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output
Launch dir        : /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/aadder/output/Nov2018acc
Working dir       : /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output/work
Script dir        : /projects1/clusterhomes/velsko/.nextflow/assets/nf-core/mag
User              : velsko
Config Profile    : shh
Config Description: Generic MPI-SHH cluster(s) profile provided by nf-core/configs.
Config Contact    : James Fellows Yates (@jfy133), Maxime Borry (@Maxibor)
Config URL        : https://shh.mpg.de
executor >  slurm (2)
[c0/ad731a] process > fastqc_raw             [ 95%] 36 of 38, cached: 36
executor >  slurm (2)
[c0/ad731a] process > fastqc_raw             [100%] 38 of 38, cached: 36, failed: 2
executor >  slurm (2)
[c0/ad731a] process > fastqc_raw             [100%] 38 of 38, cached: 36, failed: 2
[64/465396] process > fastp                  [100%] 39 of 39, cached: 39
[f0/82d032] process > get_software_versions  [100%] 1 of 1, cached: 1 ✔
[64/f2cbd8] process > phix_download_db       [100%] 1 of 1, cached: 1 ✔
[a5/7d8194] process > kraken2_db_preparation [  0%] 1 of 0, failed: 1
Staging foreign file: ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz
Execution cancelled -- Finishing pending tasks before exit
WARN: Unable to stage foreign file: ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz (try 1) -- Cause: pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz
WARN: Unable to stage foreign file: ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz (try 2) -- Cause: pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz
WARN: Unable to stage foreign file: ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz (try 3) -- Cause: pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz
WARN: Killing pending tasks (2)
ERROR ~ Error executing process > 'kraken2_db_preparation (1)'

Caused by:
  Can't stage file ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz -- reason: pub/data/kraken2_dbs/old/minikraken2_v2_8GB_201904_UPDATE.tgz

Source block:
  """
  tar -xf "${db}"
  """

Work dir:
  /projects1/microbiome_calculus/Cameroon_plaque/04-analysis/assembly/output/work/a5/7d8194ef6a450faa240eac1779a5ca

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume`

 -- Check '.nextflow.log' file for details```
d4straub commented 4 years ago

Sorry for the extremely late reply. Yes, download the database and refer to the local copy in the parameters.

d4straub commented 4 years ago

This was also actually addressed in #54. All external sources are probably good to download before starting the analysis in case the connection has hickups, e.g. kraken/centrifuge/cat databases.