nextflow-io / training

Nextflow training material
https://training.nextflow.io/
Other
126 stars 113 forks source link

Section 5.2.6 fromSRA hangs #126

Open sktrinh12 opened 1 year ago

sktrinh12 commented 1 year ago

Hi, I tried running the .nf file as below, but it hangs and doesn't do anything:

params.ncbi_api_key = "<API_KEY>"
params.accession = ['ERR908507', 'ERR908506']

process fastqc { 
  container "biocontainers/fastqc:v0.11.5"
  input:
  tuple val(sample_id), path(reads_file)

  output:
  path("fastqc_${sample_id}_logs")

  script:
  """
  mkdir fastqc_${sample_id}_logs
  fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads_file}
  """
}

workflow {
    reads = Channel
                .fromSRA(params.accession, apiKey: params.ncbi_api_key)
        fastqc(reads)
}

It is stuck at:

N E X T F L O W  ~  version 22.12.0-edge
Launching `test12.nf` [crazy_archimedes] DSL2 - revision: 6a7f47a8af
[-        ] process > fastqc -
[-        ] process > fastqc -                                                               Staging foreign file: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_1.fastq.g[-        ] process > fastqc -
Staging foreign file: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_1.fastq.gz
[-        ] process > fastqc -                                                               Staging foreign file: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_1.fastq.gz                                                                                            Staging foreign file: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908506/ERR908506_1.fastq.gz
Staging foreign file: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_2.fastq.gz
Staging foreign file: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908506/ERR908506_2.fastq.gz

I left it fro 45 mins and was still at that same position. Was this pipeline supposed to finish quickly? Thanks for help.

mribeirodantas commented 1 year ago

I've been there 😅 Exactly there. I also thought it was hanging and something was wrong, but in the end, it just takes a very long time to download everything. If you think everything has been downloaded, you can check with:

find work/stage -name '*fastq.gz' | xargs gunzip --test
sktrinh12 commented 1 year ago

thanks for quick response, that makes sense. I guess I'll leave it overnight and see if it finishes.

mribeirodantas commented 1 year ago

You can also check the files within work/stage and see that the filesizes are increasing with time.

chriswyatt1 commented 1 year ago

Did it run in the end? It shouldn't really take that long unless there was a problem with SRA or a local connectivity issue. Are you sure you entered the correct personal API token number from NCBI?

There should be a section in the training docs to explain how to get the key, at the moment it is not so clear how to obtain the key (e.g. https://support.nlm.nih.gov/knowledgebase/article/KA-05317/en-us).

mribeirodantas commented 1 year ago

In my case, it did. A good option is to check the file sizes in the workdir. If it keeps increasing, it's working, though it can take a while depending on your internet connection. In the training material, there is a drop-down with instructions: https://training.nextflow.io/basic_training/channels/#fromsra

mribeirodantas commented 1 year ago

Did it work for you in the end, @sktrinh12?