nf-core / fetchngs

Pipeline to fetch metadata and raw FastQ files from public databases
https://nf-co.re/fetchngs
MIT License
151 stars 73 forks source link

Add support for prefetch argument `--max-size` #66

Closed jfy133 closed 2 years ago

jfy133 commented 2 years ago

Description of feature

I was trying to download some data, and apparently one of the files was 'too big' for the sra tools prefetch thingy.

Seems like the solution is given in the message. I will try specifying it with a custom modules.conf, but if it works I think it would be good to add inbuilt support :+1:

Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR059917)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR059917)` terminated with an error exit status (3)

Command executed:

  eval "$(vdb-config -o n NCBI_SETTINGS | sed 's/[" ]//g')"
  if [[ ! -f "${NCBI_SETTINGS}" ]]; then
      mkdir -p "$(dirname "${NCBI_SETTINGS}")"
      printf '/LIBS/GUID = "44fc8155-3f0b-4ef8-a7c2-6d375100ae27"\n/libs/cloud/report_instance_identity = "true"\n' > "${NCBI_SETTINGS}"
  fi

  retry_with_backoff.sh prefetch \
       \
      --progress \
      SRR059917

  vdb-validate SRR059917

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH":
      sratools: $(prefetch --version 2>&1 | grep -Eo '[0-9.]+')
  END_VERSIONS

Command exit status:
  3

Command output:

  2021-12-13T11:41:44 prefetch.2.11.0: 1) 'SRR059917' (34GB) is larger than maximum allowed: skipped 

  Download of some files was skipped because they are too large
  You can change size download limit by setting
  --min-size and --max-size command line arguments

Command error:
  WARNING: While bind mounting '/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70:/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70': destination is already in the mount point list
  2021-12-13T11:41:44 prefetch.2.11.0 warn: Maximum file size download limit is 20GB 
  2021-12-13T11:41:44 vdb-validate.2.11.0 info: 'SRR059917' could not be found

Work dir:
  /mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Midnighter commented 2 years ago

I was wondering while creating the module if there is ever a downside to not limiting the download size at all. I guess it could be somewhat unexpected to get a file that's close to 100 GB or so but then again the user chose the respective IDs... What do you think?

jfy133 commented 2 years ago

Yeah I would agree there... you should know what you're downloading. But on the otherhand maybe that's not people check when fetchngs is making it 'so easy' to download stuff?

Midnighter commented 2 years ago

I would be okay with setting the default args to --max-size u then it can still be overwritten.

royfrancis commented 2 years ago

I get the same error. All my fastq files above 40GB. Is there a quick fix? I tried adding --max-size to the nextflow command but I continue to get the same error.

nextflow run nf-core/fetchngs -c params.config --max-size 60G

Midnighter commented 2 years ago

In your local config, you can set

process {
    withName: SRATOOLS_PREFETCH {
        ext.args = '--max-size 60g'
    }
}
drpatelh commented 2 years ago

Looks like this is resolved so closing.

azedinez commented 1 year ago

Hi @drpatelh. On NF Tower, since I'm a launch user, I don't have the permissions to modify this attribute and so it's not practical if I want to modify this for a specific run. Would it be possible to expose this --max-size parameter in the GUI by default?