iterative / cml

♾️ CML - Continuous Machine Learning | CI/CD for ML
http://cml.dev
Apache License 2.0
3.99k stars 333 forks source link

cml runner seems to try and pull images from a quay.io repo instead of dockerhub #1433

Open AlistairMaccallum opened 9 months ago

AlistairMaccallum commented 9 months ago

Bug Report

runner:

Warning  Failed            8s (x2 over 52s)      kubelet            Failed to pull image "dvcorg/cml:0-dvc2-base1-gpu": rpc error: code = Unknown desc = reading manifest 0-dvc2-base1-gpu in quay.io/dvcorg/cml: unauthorized: access to the requested resource is not authorized

Description

cml runner seems to try and pull images from a quay.io repo instead of https://hub.docker.com/r/iterativeai/cml/tags

Reproduce

cml runner launch \ --cloud=kubernetes \ --labels=cml-k8s-gpu

Expected

Kubernetes is able to pull the required image for the Job

Environment information

Attempting to run jobs via github arc and kubernetes in an on-premise cluster.

Additional Information (if any):

dacbd commented 9 months ago

@AlistairMaccallum I suspect this is coming from some Openshift k8s configuration.

After some searching the contents of cat /etc/containers/registries.conf may shed some light on this.


Regardless, you should be able to resolve this by explicitly setting the image like so:

cml runner launch \
    --cloud=kubernetes \
    --labels=cml-k8s-gpu \
    --cloud-image=ghcr.io/iterative/cml:0-dvc3-base1

or your choice of image.

AlistairMaccallum commented 9 months ago

Thanks @dacbd that's working now, however, is there a way to provide credentials or indicate an existing k8s registry secret to use within the the cluster to pull from a private ecr repo?

Just for reference, my /etc/containers/registries.conf looks like this, so not sure why it was trying quay.io first, either way, it's probably better to be explicit about the image used.

# # An array of host[:port] registries to try when pulling an unqualified image, in order.
unqualified-search-registries = ["docker.io", "quay.io"]
dacbd commented 9 months ago

@AlistairMaccallum do you mean like this: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

or are you trying to pull another image from your workflow that is running in the cml image?

AlistairMaccallum commented 9 months ago

@dacbd Yes, I have a k8s registry secret like what is described in the link, I'm trying to run cml like this

cml runner launch \
                --cloud=kubernetes \
                --labels=cml-k8s-gpu \
                --cloud-image=aws-id.dkr.ecr.aws-region.amazonaws.com/my-image:my-tag

I had tried this as part of the job but I suspect it doesn't help because the k8s cluster needs the credential to pull the image rather than the container the action is running in.

   steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4 # More information on this action can be found below in the 'AWS Credentials' section
        with:
          aws-region: eu-west-2
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2
dacbd commented 9 months ago

@AlistairMaccallum, thats correct, it would be k8s doing the pulling of the container and not the cml command.

I'm sure there are plenty of help articles out there for accessing ECR from your k8s cluster. If you get stuck feel free to reach out again but I'm not sure how much help we can be.

AlistairMaccallum commented 9 months ago

@dacbd This seems to answer my question https://github.com/iterative/cml/issues/1342 however I think a more intuitive way would be to have an additional flag for the runner something like this maybe? Where you can specify a secret that exists in kubernetes already.

cml runner launch \
                --cloud=kubernetes \
                --labels=cml-k8s-gpu \
                --cloud-image=aws-id.dkr.ecr.aws-region.amazonaws.com/my-image:my-tag \
                --cloud-image-secret=myregcred