nf-core / fetchngs

Pipeline to fetch metadata and raw FastQ files from public databases
https://nf-co.re/fetchngs
MIT License
129 stars 69 forks source link

wget host address error #291

Open mniederhuber opened 5 months ago

mniederhuber commented 5 months ago

Description of the bug

I tried out the dev branch and am encountering a wget error in process NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP The underlying error is:

wget: unable to resolve host address 'ftp.sra.ebi.ac.uk'

Getting the error for a number of SRX experiment ids that have successfully downloaded with sra-tools in the past.

I'll try to see if I can figure out the issue, but figured I'd bring it up.

Command used and terminal output

#! /bin/bash
#SBATCH --mem=8G
#SBATCH -t 6:00:00
#SBATCH -p general
#SBATCH -o var/log/fetch-%j.out
#SBATCH -e var/log/fetch-%j.err

module load nextflow

nextflow -log var/log/.fetchngs run nf-core/fetchngs -r dev \
    -profile unc_longleaf \
    -params-file config/fetchngs_params.yaml

Relevant files

logfile.txt

System information

Nextflow 23.04.02 HPC slurm Singularity RHEL8 fetchngs dev

Midnighter commented 5 months ago

Could be intermittent network or server issues. ENA/SRA do see a lot of traffic.

CJPerkins1 commented 4 months ago

I'm experiencing the same issue with wget using the dev branch, were you able to get this to work?

pvanheus commented 4 months ago

This is because of a problem with the Singularity container. A certain generation of containers was built with a Busybox that had a broken /etc/resolv.conf. I have reported this to the Galaxy folks who build the Singularity containers and will follow up once that is fixed.

josemunozc commented 3 months ago

I think the problem is the container.

$ module load singularity-ce/4.1.0
$ singularity shell depot.galaxyproject.org-singularity-wget-1.20.1.img
WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
Singularity> wget -t 5 -nv -c -T 60 -O ERX2235404_ERR2179103_2.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/ERR217/003/ERR2179103/ERR2179103_2.fastq.gz
wget: unable to resolve host address 'ftp.sra.ebi.ac.uk'

However, if I try the latest version of the container (check https://depot.galaxyproject.org/singularity/):

$ singularity pull https://depot.galaxyproject.org/singularity/wget:1.21.4
$ singularity shell wget\:1.21.4
Singularity> wget -t 5 -nv -c -T 60 -O ERX2235404_ERR2179103_2.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/ERR217/003/ERR2179103/ERR2179103_2.fastq.gz
Singularity> ls ERX2235404_ERR2179103_2.fastq.gz
ERX2235404_ERR2179103_2.fastq.gz

So, I guess the solution is to instruct Nextflow to fetch the latest image in modules/local/sra_fastq_ftp/main.nf:

    conda "conda-forge::wget=1.20.1"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/wget:1.20.1' :
        'biocontainers/wget:1.20.1' }"

change to (conda also for consistency but I haven't test):

    conda "conda-forge::wget=1.21.4"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/wget:1.21.4' :
        'biocontainers/wget:1.21.4' }"
josemunozc commented 3 months ago

sorry, just realized that the suggested change already made it to the dev branch :p