nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.73k stars 626 forks source link

Pipeline on Azure fails when trying to pull private container on docker hub #2354

Closed abhi18av closed 3 years ago

abhi18av commented 3 years ago

Bug report

Upon trying to pull a private container from docker-hub on Azure Batch, using the v21.05.0-edge release, the pipeline fails with the following error message

Caused by:
  One or more container images specified are invalid

Expected behavior and actual behavior

The Azure batch executor should be able to pull the containers from docker-hub using the advanced configs mentioned here https://www.nextflow.io/docs/edge/azure.html#advanced-settings

Steps to reproduce the problem

PFA the relevant files with redacted information

main.nf.txt nextflow.log.txt nextflow.config.txt

Environment

manuelesimi commented 3 years ago

Hello,

Thanks for reporting this issue. I think the cause is a missing information in the documentation.

When you use a private registry, you have to specify the repository also in the container setting as follows: repo/org/image:tag

Looking at your config file, you should set the container to: docker.io/FIXME/PRIVATE_CONTAINER:latest

(of course, you will replace the placeholders with your actual values). This is how AWS Batch matches your image with the repo settings.

If you confirm that this is correct, we will update the documentation accordingly.

Thanks, manuele

manuelesimi commented 3 years ago

I did some testing, and confirmed that the missing server in the container option generates the error reported in this issue.

See: https://github.com/nextflow-io/nextflow/pull/2355

abhi18av commented 3 years ago

Hi @manuelesimi ,

Thanks for the quick revert and the initiative on this 🙌

So basically, the relevant bits in the config should be

process {
  withName:TRIMMOMATIC {
    container = 'docker.io/FIXME/PRIVATE_CONTAINER:latest'  // <--- Added the docker.io at the start.
  }
}

azure {
  batch {
    registry {
      server =  'docker.io'
      userName =  'FIXME'
      password =  'FIXME'
    }

}

Apart from this, wouldn't it be possible to simplify this a bit and use the registry specified in azure.batch.registry.server to automatically augment the container URIs used in the pipeline?

Right now, the same information i.e. docker.io needs to be mentioned twice, unless I'm mistaken 🤔

manuelesimi commented 3 years ago

You are assuming that all the docker images used in the pipeline are pulled from the same private registry. Which is not the case (as the NF doc says).

We can have a mix of public and private images, some pulled from Docker Hub, others from an Azure Container Registry, others from Quai, and so on. Because of that, we can't automatically add the server to the container URI. Also, I think it would be misleading to default docker.io when the server is not present, because we would try to pull the image from the wrong repository.

IMHO, the note I added to the doc should be sufficient to clarify to the users what is expected and why.

abhi18av commented 3 years ago

Hmm, okay does makes sense to be explicit about these things.

Testing with the suggested changes now and will circle back with updates :)

pditommaso commented 3 years ago

Wondering if the container registry should not also be specified in task container settings

https://github.com/nextflow-io/nextflow/blob/secrets-store/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchService.groovy#L335-L339

manuelesimi commented 3 years ago

Technically, it can be specified in the TaskContainerSettings with withContainer(). But the javadoc says:

    /**
     * Set this setting can be omitted if was already provided at Pool creation.
     *
     * @param registry the registry value to set
     * @return the TaskContainerSettings object itself.
     */
    public TaskContainerSettings withRegistry(ContainerRegistry registry) {
        this.registry = registry;
        return this;
    }

Since we are providing the private registry with the pool, and we don't list other registries, I thought calling this method was redundant.

Also, what registry would we provide to this method? We go back to the point of one of my previous comments.. how do we know from which repo the user wants to pull the image? Not all the images are from the private registry, others can be from several public registries (I have several use cases myself)... we will always end up to have a 1:1 mapping between image and registry if we want to be as generic as possible.

Since the format repo/org/image:tag is the one formalized by Docker I don't see the need to invent here another mapping. At the end, we are just asking users to provide a plain Docker full path.

pditommaso commented 3 years ago

I think an inconsistency can arise when using the auto-pool mode, and the registry changes across different launches e.g. public -> private. The pool gets created with the public registry (no creds), then a user launch a pipeline in which there's a private registry with the same pool settings, therefore the same pool (id) assigned and pull for the private container would fail.

Does it make sense?

pditommaso commented 3 years ago

Indeed, the unique ID for the pool is given by this snippet, the registry settings does not affect it

https://github.com/nextflow-io/nextflow/blob/secrets-store/plugins/nf-azure/src/main/nextflow/cloud/azure/config/AzPoolOpts.groovy#L85-L100

pditommaso commented 3 years ago

Hold-on, it's there!

https://github.com/nextflow-io/nextflow/blob/secrets-store/plugins/nf-azure/src/main/nextflow/cloud/azure/config/AzPoolOpts.groovy#L92-L94

manuelesimi commented 3 years ago

Hold-on, it's there!

https://github.com/nextflow-io/nextflow/blob/secrets-store/plugins/nf-azure/src/main/nextflow/cloud/azure/config/AzPoolOpts.groovy#L92-L94

I tested it at the time I added the private registry support... and found out that without the registry in the pool ID I would have troubles...

:)

pditommaso commented 3 years ago

Damn found! The docs says

azure {
    batch {
        registry {
            server =  '<YOUR REGISTRY SERVER>' // e.g.: docker.io, quay.io, <ACCOUNT>.azurecr.io, etc.
            userName =  '<YOUR REGISTRY USER NAME>'
            password =  '<YOUR REGISTRY PASSWORD>'
        }
    }
}

but should be

azure {
        registry {
            server =  '<YOUR REGISTRY SERVER>' // e.g.: docker.io, quay.io, <ACCOUNT>.azurecr.io, etc.
            userName =  '<YOUR REGISTRY USER NAME>'
            password =  '<YOUR REGISTRY PASSWORD>'
        }
}

https://github.com/nextflow-io/nextflow/blob/dc9625eb938018441f6e1baf504fb2a67efb52a6/plugins/nf-azure/src/main/nextflow/cloud/azure/config/AzConfig.groovy#L31-L35

manuelesimi commented 3 years ago

whoops... no idea how that happened. A likely wrong copy & paste.

Fixed in https://github.com/nextflow-io/nextflow/pull/2355.

Sorry!

pditommaso commented 3 years ago

no pb, thanks for your feedback!

manuelesimi commented 3 years ago

no pb, thanks for your feedback!

Thanks to you for spotting the mistake in the doc!

pditommaso commented 3 years ago

Solved 040e190bd

abhi18av commented 3 years ago

Thanks @manuelesimi and @pditommaso for the quick work on this 🦅 (hawk-eyes) solution.