onepanelio / onepanel

The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.
https://docs.onepanel.ai/
Apache License 2.0
708 stars 69 forks source link

kfserving controller image pull unauthorized #981

Open DanielMemmelAa opened 1 year ago

DanielMemmelAa commented 1 year ago

Describe the bug /kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.] Following the docs tutorial with multipass on a Windows machine.

After deploying Onepanel with microk8s config > kubeconfig KUBECONFIG=./kubeconfig opctl apply the kfserving-controller:v0.6.0 image fails to pull with an 401 Unauthorized error.

In the Onepanel UI creating a new model server like in here results in the following error:

[500] Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.v1beta1.defaulter": Post "https://kfserving-webhook-server-service.kfserving-system.svc:443/mutate-serving-kubeflow-org-v1beta1-inferenceservice?timeout=30s": dial tcp 10.152.183.188:443: connect: connection refused http://serving.onepanel.pvaintern/api/namespaces/pvaonepanel/inferenceservices

I am guessing that those two error are connected.

What did you expect to happen: The gcr.io/kfserving/kfserving-controller:v0.6.0 should be accessible. A new model server should be created.

Anything else you would like to add: Output of microk8s.kubectl get pods/kfserving-controller-manager-0 -n kfserving-system kfserving-system kfserving-controller-manager-0 1/2 ImagePullBackOff 1 23h

kubectl describe pod: Events: Type Reason Age From Message

Warning Failed 26m (x267 over 23h) kubelet Failed to pull image "gcr.io/kfserving/kfserving-controller:v0.6.0": rpc error: code = Unknown desc = failed to pull and unpack image "gcr.io/kfserving/kfserving-controller:v0.6.0": failed to resolve reference "gcr.io/kfserving/kfserving-controller:v0.6.0": pulling from host gcr.io failed with status code [manifests v0.6.0]: 401 Unauthorized Normal Pulling 21m (x269 over 23h) kubelet Pulling image "gcr.io/kfserving/kfserving-controller:v0.6.0" Normal BackOff 55s (x6038 over 23h) kubelet Back-off pulling image "gcr.io/kfserving/kfserving-controller:v0.6.0"

Output of microk8s.kubectl logs pod/kfserving-controller-manager-0 -n kfserving-system -c manager Error from server (BadRequest): container "manager" in pod "kfserving-controller-manager-0" is waiting to start: trying and failing to pull image

Output of microk8s.kubectl logs pod/kfserving-controller-manager-0 -n kfserving-system -c kube-rbac-proxy I1004 07:52:03.495440 1 main.go:209] Generating self signed cert as no cert is provided I1004 07:52:03.666661 1 main.go:242] Listening securely on 0.0.0.0:8443

Anything else you would like to add:

Importing the docker image via microk8s ctr image import kfserving.kfserving-controller.tar manually did not solve the problem.

According to the issues below changing the pull location from gcr.io to docker.io should help. (This where I was able to pull the image manually.) https://github.com/kserve/kserve/issues/1781 https://github.com/kserve/kserve/issues/1976#issuecomment-1007453347 https://hub.docker.com/u/kfserving

I also tried changing line 32121 (below) in .onepanel/kubernetes.yaml from gcr.io to the docker.io and applying the changes with KUBECONFIG=./kubeconfig opctl apply but the file was reset to its original state.

containers:

opctl version CLI version: v1.0.2 Manifest version: v1.0.2 API version: v1.0.2 Web UI version: v1.0.2

opctl init command opctl init --provider microk8s --enable-metallb --artifact-repository-provider s3

Kubernetes information

Machine information

Any help would be appreciated! Thanks :)

DanielMemmelAa commented 1 year ago

Update 1: I was able to change the image pull location from gcr.io to docker.io in the kubernetes dashboard and apply the changes. The kfserving controller is up and running. This should definitely be changed in the original configuration.

Update 2: However the flower example did not work quite yet. The issue was that kfserving changed the storageUri of their example: Change storageUri: "gs://kfserving-samples/models/tensorflow/flowers to storageUri: "gs://kfserving-examples/models/tensorflow/flowers