nf-core / pathogensurveillance

Surveillance of pathogens using population genomics and sequencing
https://nf-co.re/pathogensurveillance
MIT License
13 stars 5 forks source link

Glitch at the download genome step #13

Closed cahuparo closed 8 months ago

cahuparo commented 1 year ago

Description of the bug

This maybe nothing at all but maybe important to mention in the documentation, that this process may require restart... At the download genome assemblies step, I got this error:

ERROR ~ Error executing process > 'NFCORE_PLANTPATHSURVEIL:PLANTPATHSURVEIL:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES (GCA_016864655.1)'

Caused by:
  Process `NFCORE_PLANTPATHSURVEIL:PLANTPATHSURVEIL:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES (GCA_016864655.1)` terminated with an error exit status (2)

Command executed:

  # Download assemblies as zip archives
  datasets download genome accession GCA_016864655.1 --include gff3,rna,cds,protein,genome,seq-report --filename GCA_016864655.1.zip

  # Unzip
  unzip GCA_016864655.1.zip

  # Rename files with assembly name
  if [ -f ncbi_dataset/data/GCA_016864655.1/genomic.gff ]; then
      mv ncbi_dataset/data/GCA_016864655.1/genomic.gff ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1.gff
  fi
  if [ -f ncbi_dataset/data/GCA_016864655.1/cds_from_genomic.fna ]; then
      mv ncbi_dataset/data/GCA_016864655.1/cds_from_genomic.fna ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1_cds.fna
  fi
  if [ -f ncbi_dataset/data/GCA_016864655.1/protein.faa ]; then
      mv ncbi_dataset/data/GCA_016864655.1/protein.faa ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1.faa
  fi

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_PLANTPATHSURVEIL:PLANTPATHSURVEIL:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES":
      datasets: $(datasets --version | sed -e "s/datasets version: //")
  END_VERSIONS

Command exit status:
  2

Command output:
  Archive:  GCA_016864655.1.zip
    inflating: README.md
    inflating: ncbi_dataset/data/assembly_data_report.jsonl
    inflating: ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1_ASM1686465v1_genomic.fna
    inflating: ncbi_dataset/data/GCA_016864655.1/genomic.gff    inflating: ncbi_dataset/data/GCA_016864655.1/cds_from_genomic.fna
    inflating: ncbi_dataset/data/GCA_016864655.1/protein.faa
    inflating: ncbi_dataset/data/GCA_016864655.1/sequence_report.jsonl
    inflating: ncbi_dataset/data/dataset_catalog.json
  Collecting 1  records [================================================] 100% 1/1
  Downloading: GCA_016864655.1.zip    41MB done
  Archive:  GCA_016864655.1.zip
    inflating: README.md
    inflating: ncbi_dataset/data/assembly_data_report.jsonl
    inflating: ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1_ASM1686465v1_genomic.fna
    inflating: ncbi_dataset/data/GCA_016864655.1/genomic.gff
    error:  invalid compressed data to inflate
    inflating: ncbi_dataset/data/GCA_016864655.1/cds_from_genomic.fna
    inflating: ncbi_dataset/data/GCA_016864655.1/protein.faa
    inflating: ncbi_dataset/data/GCA_016864655.1/sequence_report.jsonl
    inflating: ncbi_dataset/data/dataset_catalog.json

Work dir:
  /nfs7/BPP/Chang_Lab/paradarc/nf_brady_N120/scripts/nf-core-plantpathsurveil/work/00/7f5d8df6693888570725145aa13835

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

It could be that it is a glitch on the download or unzip process. I ran it again (-resume) and the download works just fine.

Command used and terminal output

No response

Relevant files

No response

System information

No response

zachary-foster commented 1 year ago

I added code for that step to be retried up to some number of times and then just not include that reference and continue if it fails too many times. I think it helped with a lot of those random internet connection related errors.

zachary-foster commented 1 year ago

Are you still seeing such errors stop the pipeline from running?

zachary-foster commented 8 months ago

should be fixed