kubeflow / fairing

Python SDK for building, training, and deploying ML models
Apache License 2.0
337 stars 144 forks source link

Fail to pull image from local registry #380

Open benoitdr opened 4 years ago

benoitdr commented 4 years ago

/kind bug

What steps did you take and what happened: Installing and configurig kubeflow-fairing following the procedure on kubeflow website. Running the example from examples/simple/main.py, setting DOCKER_REGISTRY to localhost:32000

A fairing-job image is correctly pushed to the local registry, the job is started but the pod cannot pull the image from the local registry

curl -X GET http://localhost:32000/v2/fairing-job/tags/list
{"name":"fairing-job","tags":["640E50A"]}
kubectl describe pod/fairing-job-p8cf2-dcp6g

...

  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  30m                    default-scheduler  Successfully assigned default/fairing-job-p8cf2-dcp6g to k8s
  Normal   Pulling    29m (x4 over 30m)      kubelet, k8s       Pulling image "localhost:32000/fairing-job:640E50A"
  Warning  Failed     29m (x4 over 30m)      kubelet, k8s       Failed to pull image "localhost:32000/fairing-job:640E50A": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:32000/fairing-job:640E50A": failed to copy: httpReaderSeeker: failed open: unexpected status code http://localhost:32000/v2/fairing-job/manifests/sha256:b92a770802696b2303556ff0ff6fb23340d6eb506c8a1080a9f06b12ef28725e: 500 Internal Server Error
  Warning  Failed     29m (x4 over 30m)      kubelet, k8s       Error: ErrImagePull
  Warning  Failed     5m37s (x110 over 30m)  kubelet, k8s       Error: ImagePullBackOff
  Normal   BackOff    34s (x132 over 30m)    kubelet, k8s       Back-off pulling image "localhost:32000/fairing-job:640E50A"

What did you expect to happen: The image must be pulled from the local registry and the job should complete without issue.

Anything else you would like to add:

I'm working with microk8s registry add-on. I have verified that using the procedure desribed at https://microk8s.io/docs/working, I can deploy some pods using images pulled from the local registry.

Environment:

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.99. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

jinchihe commented 4 years ago

@benoitdr personally I think the problem is not related with kubeflow-faring, but kubenertes, could you please try the create the job manually with assioated with the image in localhost:32000, see if can be started? Thanks.

benoitdr commented 4 years ago

@jinchihe, Thanks for your suggestion. How can I create the job manually ? Is there a way to use kubeflow-fairing to generete a yaml file for it ?

jinchihe commented 4 years ago

I mean just create a sample job to test the your local docker hub :-)

benoitdr commented 4 years ago

yes that's working. Following the procedure from https://microk8s.io/docs/working, I can deploy an nginx image from the local registry.

jinchihe commented 4 years ago

@benoitdr that's strange... I think that's same with nginx job here, seems nothing with kubeflow-fairing here.

benoitdr commented 4 years ago

@jinchihe I'm not sure. I can pull images from a local registry (and from hub.docker.com) but for some reason kubeflow cannot do it. It might be a common issue with https://github.com/kubeflow/fairing/issues/382

xauthulei commented 4 years ago

@benoitdr , In my mind, maybe in your k8s cluster you have used a default docker registry Registry: https://index.docker.io/v1/. if you want to pull the image in your pod from your private repository, you need to to login with it firstly, or change your default ones. Thanks.

benoitdr commented 4 years ago

It's not a login issue. I think it's related to microk8s. See https://github.com/ubuntu/microk8s/issues/681

hamedhsn commented 4 years ago

I hit by the same problem(you can pull images from dockerhub but not locally) with kubernetes in docker. As a workaround if you set the pull policy to Never then it will be forced to use the local images. Not sure if we have option to pass the pull policy value in fairing. I will check that later.

jtfogarty commented 4 years ago

/area example /priority p2

mochiliu3000 commented 4 years ago

Using microk8s > 1.13 will hit this error since it uses microk8s.ctr and dockerd is replaced with containerd.

'Append builder' calls Layer Class method originally from containerregistry, however fairing has an older version. See append_.py code difference: https://github.com/google/containerregistry/blob/8a11dc8c53003ecf5b72ffaf035ba280109356ac/client/v2_2/append_.py#L68

I've tried to change 'mediaType' to 'docker_http.LAYER_MIME' in fairing code, but still not work. The image manifest or digest seems not compatible. Need to check with containerregistry if containerd style image is supported and can be built with Layer Class method.

How do you feel? @jinchihe