nf-core / modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
https://nf-co.re/modules
MIT License
283 stars 723 forks source link

MD5SUM hits docker pull limit #2547

Open berguner opened 2 years ago

berguner commented 2 years ago

Description of the bug

Hi,

I was running the pipeline on AWS Batch and MD5SUM tasks failed due to docker pull limit. I assume this happens when there are more than ~100 FastQ files. It seems like this task was pulling the ubuntu:10.04 image from Docker hub, so it can be fixed by pointing to an image on quay.io/biocontainers.

https://github.com/nf-core/demultiplex/blob/b0a004eb2e79f6fceb9b5d79b91b563b7724ed62/modules/nf-core/md5sum/main.nf#L8

Command used and terminal output

Error executing process > 'NFCORE_DEMULTIPLEX:DEMULTIPLEX:MD5SUM (RNA2015_34_S34_L002)'                                                                                                                                                      
Caused by:
  Task failed to start - CannotPullContainerError: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Command executed:

  md5sum \
       \
      RNA2015_34_S34_L002_R2_001.fastq.gz \
      > RNA2015_34_S34_L002_R2_001.fastq.gz.md5

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_DEMULTIPLEX:DEMULTIPLEX:MD5SUM":
      md5sum: $(echo $(md5sum --version 2>&1 | head -n 1| sed 's/^.*) //;' ))
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Relevant files

No response

System information

N E X T F L O W ~ version 22.04.5

ewels commented 2 years ago

uff.. I wonder if we go back to using an nfcore container for this. We're listed as OSS on docker-hub so shouldn't have any pull limits.

Alternatively we could use a mirror on quay.io as that's the same docker registry as biocontainers and shouldn't have pull limits. eg: https://quay.io/repository/bedrock/ubuntu

berguner commented 2 years ago

I guess it should be fine to use any linux image with md5sum in it. For example I was able to run it with tabix image, which also has a small footprint. Below is the configuration that I used.

process {
    withName: MD5SUM {
                container = "quay.io/biocontainers/tabix:0.2.6--ha92aebf_0"
        }
}
edmundmiller commented 2 years ago

@matthdsm Any objections to using https://quay.io/repository/bedrock/ubuntu?

matthdsm commented 2 years ago

I don't really care which image you use, just make sure it's not some bloated mess so we can keep the download times low

edmundmiller commented 2 years ago

With that, we could probably get away alpine for this task, but I'm thinking across the board whether that would work for these minimal containers

matthdsm commented 2 years ago

I agree with alpine! We'll have to make sure it uses the same algorithms for everything though, been burned on that before.