nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.75k stars 629 forks source link

Support login to a docker registry #45

Closed epearson-nmdp closed 4 years ago

epearson-nmdp commented 9 years ago

It would be great to be able to log into a docker registry ("login" CLI command). This permits accessing private registries that authorization, as well as private repos on Docker Hub.

kzuberi commented 7 years ago

Would it be possible, as an temp alternative, to register some user-script to run after worker nodes are created where you could perform ad-hoc node setup tasks like execute a docker login?

zichner commented 7 years ago

I am also highly interested in this feature. Is there currently any plan to implement the possibility to access private registries (e.g., AWS EC2 Container registry). This would be really great! Thank you very much!

pditommaso commented 7 years ago

The problem here is that NF cannot run the docker login command for each task it executes. Is there any docker expert here that know how to bypass the login command proving the auth token is some other way?

serverhorror commented 7 years ago

One could create the docker auth token manually. For the docker CLI tool it just checks for a file and then looks at it: $HOME/.docker/config.json (Windows: %USERPROFILE%/.docker/config.json)

This might also be helpful:

In case of AWS ECR that would not be sufficient, especially when running nextflow outside of AWS. There would need to be some kind of auth step that runs as AWS requires to the aws ecr get-login (from CLI):

@zichner: please CC me on these kinds of tickets :)

pditommaso commented 7 years ago

I would be happy to implement this feature but the main problem is that NF does not deploy a daemon in the remote computing nodes. Thus it's challenging to maintain the current user authentication state. This also means that the authentication mechanism should be implemented at the level of the job wrapper script that NF create to run the user task.

If you are able to propose an implementation based on the following caveats I would happy to integrate it:

As a proof concept this can be implemented at level of a NF script process.

pditommaso commented 7 years ago

A question to the people interested in this issue. What is the deployment scenario in which you would like to use this feature? HPC batch scheduler (such as SLURM, SGE, etc) or cloud deployment by using the NF Ignite executor ?

serverhorror commented 7 years ago

We need both as there's often a restriction on data that might not be legally processed with any Cloud provider.

I have to admit I have no idea what NF Ignite is.

pditommaso commented 7 years ago

When using a SGE-like batch scheduler, in the most common deployment scenario the user home is mounted in the shared file system, thus you can simply authenticate in the Docker hub before launching the pipeline execution. Would this work in your case?

When using SGE deployed in the AWS cloud with cfnCluster, it's the same as before. AWS batch is not supported by NF.

Regarding GoogleCloud and Azure, are you deploying NF workloads in these clouds? How ?

Apache Ignite is the clustering engine embedded with Nextflow that can be used to deploy a computation in the AWS cloud without the need of setting up a third part scheduler such as SGE. You can learn more here.

serverhorror commented 7 years ago

I imagine for NF is to take care of authenticating to a docker registry so that images are being pulled if I'm successfully authenticate or the whole thing bails out, as early as possible, if I'm not. Does that make sense?

At best I don't have to honor any special things outside of NF but can provide some MetaData to the actual workflow or NF config that will then "do the right thing".

Maybe something along the lines of https://www.nextflow.io/docs/latest/amazons3.html#security-credentials

# This should be unrelated to docker registry auth as it serves a different purpose
aws {
  accessKey = '<Your AWS access key>'
  secretKey = '<Your AWS secret key>'
  region = '<AWS region identifier>'
}

# not sure about syntax but this seems reasonable from a user perspective
docker {
  quay.io {
    username = 'my-user-name'
    password = 'my-password'
    email = 'my-email'
  }
  gcr.io {
    username = 'oauth2accesstoken'
    token = 'actual-access-token'
  }
  <aws-account-id>.dkr.ecr.eu-west-1.amazonaws.com/<repository-name> {
    // This will be interesting as AWS ECR tokens expire after 12h
    // I believe there's nothing one can do about that
    ...
  }
  <another-aws-account-id>.dkr.ecr.eu-west-1.amazonaws.com/<another-repository-name> {
    // This will be interesting as AWS ECR tokens expire after 12h
    // I believe there's nothing one can do about that
    ...
  }
  ...
}

Unfortunately in the current state there is no single way to authenticate to docker registries all the way thru. That would be the best user experience. But as we have it each cloud provider requires its own kind of "pre auth step" to acquire an actual token for the registry

SGE: Yes that is common, still it would make the world much easier to just be able to securely pass credentials to NF so that it will take care of that. In the case of SGE the approach to "pre-login" would require users to login to the cluster before starting a workflow and it would get interesting when things are triggered based on some event and fire up a cluster without human interaction.

Please ignore the rest of the providers I have been asking for. I'm new to NF so I might have mixed up topics here and I don't want to sidetrack the actual question of private registry authentication -- Sorry about that.

I think the best solution is to have something built-in that will authenticate to a private registry regardless of which "provider" is being used (SGE, AWS, Ignite, ....).

amacbride commented 7 years ago

I have an Ansible script that runs NF for me, so I just grab the ECR login info, and execute it right before I kick off the pipeline. The token is good for 12 hours, which has generally been sufficient for my needs.

   - name: get Docker login info
     shell: aws ecr get-login --region=us-west-2
     register: docker_login_info
     become: true
     become_user: "{{ run_user }}"

then,

   - name: login to AWS docker repo
     shell: "{{ docker_login_info.stdout }}"
     become: true
     become_user: "{{ run_user }}"

...and then I execute the Nextflow script.

pditommaso commented 7 years ago

@serverhorror yes, that makes perfectly sense and I would love to have it implemented as NF do already for AWS or GitHub for example. The main problem is the horrific authentication model used by Docker which requires a separate login command to run/pull images stored in private Hub.

I will try to investigate if the Credential helper protocol can provide ad alternative solution. See also here and here.

In the meanwhile the solution proposed by @amacbride is a good workaround.

serverhorror commented 7 years ago

Agreed we'll probably try the workaround. I have been looking at NF syntax to find something like a "process pre execution" as a possible alternative.

I like the credential helper idea. I skimmed over that and was thinking how to implement that for at least AWS. (Un)surprisingly this came up: awslabs/amazon-ecr-credential-helper.

This very much sounds like it could be low hanging fruit

/CC @pelacables

aresDeathscythe commented 6 years ago

@serverhorror the https://github.com/awslabs/amazon-ecr-credential-helper was the solution for my problem. I just needed to configure it on my EC2 machine and created an AMI out of it which is used by my nextflow.config. Now everything works smoothly without any login to do :)

pprieto commented 5 years ago

@pditommaso have you given any thought to Credential helpers ? I would be interested to help on this feature.

zichner commented 5 years ago

Just in case somebody stumbles over this rather old issue: we used the https://github.com/awslabs/amazon-ecr-credential-helper for containers stored in AWS and it works without any problems for Docker and Singularity. No modifications to Nextflow or the way Nextflow is called are needed.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.