ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.71k stars 5.73k forks source link

Copy Ray image to an alternative mirror #39009

Open tedhtchang opened 1 year ago

tedhtchang commented 1 year ago

What happened + What you expected to happen

I recently started getting this message. Possible due to https://docs.docker.com/docker-hub/download-rate-limit/

# kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-cluster.mini.yaml
# kubectl get po raycluster-mini-head-9q29l -oyaml
...
    state:
      waiting:
        message: 'rpc error: code = Unknown desc = failed to pull and unpack image
          "docker.io/rayproject/ray:2.6.3": failed to copy: httpReadSeeker: failed
          open: unexpected status code https://registry-1.docker.io/v2/rayproject/ray/manifests/sha256:9db6f33629b743cc0519f17ae9d1f2db986fb1fcc75a56c1b1740fa1fa3ac82c:
          429 Too Many Requests - Server message: toomanyrequests: You have reached
          your pull rate limit. You may increase the limit by authenticating and upgrading:
          https://www.docker.com/increase-rate-limit'
        reason: ErrImagePull

We can probably create an alternative repo such as quay.io/rayproject/ray similar to this issue. /cc @kevin85421

Versions / Dependencies

All Ray images uploaded to docker.io

Reproduction script

Pulling within the limit may not encounter this problem kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-cluster.mini.yaml

Issue Severity

Low: It blocked me from completing my task sometimes

kevin85421 commented 1 year ago

cc @krfricke @aslonnie @can-anyscale

aslonnie commented 1 year ago

I am also not sure how is this CI related.

kevin85421 commented 1 year ago

me and @can-anyscale are currently not responsible for kuberay's CI.

This isn't related to KubeRay CI. @tedhtchang requests to push "Ray images" to both DockerHub and other image registries without rate limitations, such as Quay.

aslonnie commented 1 year ago

I think they are trying to pull, not push.

maybe login first with a docker hub token?

or mirror the image to somewhere yourself and modify the yaml file?

not much I can do here really.

kevin85421 commented 1 year ago

As I understand it, the issue asks Ray CI to push images not only to DockerHub but also to other image registries. This way, they can pull Ray's official images from a registry without rate limitations. Is my understanding correct, @tedhtchang?

aslonnie commented 1 year ago

mostly all registries have rate limits.. bandwidth is not free and DoS issues are real..

not sure about quay.io , if it does not have rate limit, it should.

also docker hub is not for production use in general. ray-project/ray images on docker hub is also not for production use; it is for distribution only.

kuberay should not be encoding ray-project/ray in the k8s manifest yaml file. that is just asking for failures when user trying to scale.

self-hosting a registry backed by s3 or filesystem is also pretty simple: https://docs.docker.com/registry/deploying/

aslonnie commented 1 year ago

and quay.io seems to have rate limit too, just not very explicit about it:

only rate limits in the most severe circumstances to maintain service levels (e.g. tens of requests per second from the same IP address).

https://access.redhat.com/articles/5531191

10/s can pretty easily got hit when a ray cluster is behind a nat gateway.

tedhtchang commented 1 year ago

As I understand it, the issue asks Ray CI to push images not only to DockerHub but also to other image registries. This way, they can pull Ray's official images from a registry without rate limitations. Is my understanding correct, @tedhtchang?

Correct. I was reviewing a KubeRay PR but I failed to pull Kuberay operator and failed to create RayCluster on KinD on a VM. For Kuberay image, I could specify the alternative quay.io repo. For Ray image, there is no alternative I can specify. It would be great if you could publish or mirror Ray image to an alternative repo as part of the build process. For prod/dev, we are required to use our custom Ray image on Quay.io but I cannot use our custom image for opensource Kuberay development.

aslonnie commented 1 year ago

we can copy the ray images to places in addition to docker hub on ray release. @kevin85421 I think you can create an account and do the mirroring yourself, or add steps into the ray release process, and let the release manager do it.

but I do not want to push to other places on pipelines (for each commit). we are even thinking about stop publishing the per-commit build.

also, I am just saying that using quay.io is unlikely going to be a reliable solution to the pull rate limit. so maybe rephrase this issue or open a new one if the request is just to add a mirror.

tedhtchang commented 1 year ago

I do not want to push to other places on pipelines (for each commit).

Agree. I mostly used the default ray images comes with Raycluster yaml. I updated the issue title and lowered the Severity level because I am not getting the rate limit right now.

nvtkaszpir commented 1 year ago

Well, currently dockerhub shows 404 to any rayproject repo...

aslonnie commented 1 year ago

Well, currently dockerhub shows 404 to any rayproject repo...

we are looking at it. seems like a UI only issue. images are still pull-able.

anyscalesam commented 1 year ago

@nvtkaszpir - are you able to pull? @tedhtchang - did you run into any rate limit issues the last 2mo?

tedhtchang commented 1 year ago

Hey @anyscalesam It was sporadic before. Since last month, I am experiencing this problem every day as long as we are behind company firewall.