nf-core / fetchngs

Pipeline to fetch metadata and raw FastQ files from public databases
https://nf-co.re/fetchngs
MIT License
150 stars 72 forks source link

Pipeline crashes if some samples are not available #239

Closed alexblaessle closed 8 months ago

alexblaessle commented 10 months ago

Description of the bug

I am downloading GTEX from dbgap (25k samples) and for some samples I am getting this error, causing the pipeline to crash.

Is there a way to add an exception if a few fastqs are not available?

Command used and terminal output

export NCBI_SETTINGS="$PWD/user-settings.mkfg"

  retry_with_backoff 5 1 100 \
      prefetch \
       --ngc prj_34697.ngc \
      SRR8218552

  [ -f SRR8218552.sralite ] && vdb-validate SRR8218552.sralite || vdb-validate SRR8218552

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOLS_PREFETCH":
      sratools: $(prefetch --version 2>&1 | grep -Eo '[0-9.]+')
  END_VERSIONS

Command exit status:
  0

Command output:
  (empty)

Command error:
  WARNING: While bind mounting '/scratch/nextflow/work/blaessle/c4/317275babaf730b0f4339cabaaab75:/scratch/nextflow/work/blaessle/c4/317275babaf730b0f4339cabaaab75': destination is already in the mount point list
  2023-11-26T16:06:32 prefetch.3.0.8 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR8218552' - no data ( 404 )
  2023-11-26T16:06:32 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
  Failed attempt 1 of 5. Retrying in 1 s.
  2023-11-26T16:06:35 prefetch.3.0.8 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR8218552' - no data ( 404 )
  2023-11-26T16:06:35 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
  Failed attempt 2 of 5. Retrying in 2 s.
  2023-11-26T16:06:39 prefetch.3.0.8 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR8218552' - no data ( 404 )
  2023-11-26T16:06:39 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
  Failed attempt 3 of 5. Retrying in 4 s.
  2023-11-26T16:06:46 prefetch.3.0.8 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR8218552' - no data ( 404 )
  2023-11-26T16:06:46 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
  Failed attempt 4 of 5. Retrying in 8 s.
  2023-11-26T16:06:56 prefetch.3.0.8 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR8218552' - no data ( 404 )
  2023-11-26T16:06:56 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
  Failed after 5 attempts.
  2023-11-26T16:06:56 vdb-validate.3.0.8 info: 'SRR8218552' could not be found

Relevant files

No response

System information

Nextflow version 23.04.2 Hardware HPC Executor slurm Container engine: Singularity OS CentOS LinuxVersion
nf-core/fetchngs

Midnighter commented 10 months ago

This is by design, as it alerts you to the fact that they don't exist. However, you can configure the error strategy to ignore them, if you like.

drpatelh commented 9 months ago

Yep, there isn't much we can do here other than what @Midnighter suggested. Do any of the sra-tools have an option to validate a set of ids without downloading them? If so, we could have a pre-validation step that catches and reports this earlier and maybe even cleans the ids up before trying to download.

drpatelh commented 8 months ago

Closing now but please feel free to re-open if anyone finds this and has any bright ideas.