kubeflow-kale / kale

Kubeflow’s superfood for Data Scientists
http://kubeflow-kale.github.io
Apache License 2.0
628 stars 129 forks source link

Pipeline identity issues running the dogbreed classification example #206

Open danishsamad opened 3 years ago

danishsamad commented 3 years ago

Hi,

I am facing pipeline API server auth issues running the dogbreed classification problem from a kale note book with "HP Tuning with Katib" enabled. When I press the Complile and run Katib job" from the notebook I see the user identity missing errors in the pipeline API server logs, (excerpt below)

I0916 14:55:34.876951 7 error.go:218] Request header error: there is no user identity header. github.com/kubeflow/pipelines/backend/src/apiserver/server.getUserIdentity backend/src/apiserver/server/util.go:304 github.com/kubeflow/pipelines/backend/src/apiserver/server.isAuthorized backend/src/apiserver/server/util.go:377 github.com/kubeflow/pipelines/backend/src/apiserver/server.CanAccessNamespaceInResourceReferences backend/src/apiserver/server/util.go:350 github.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).CreateExperiment backend/src/apiserver/server/experiment_server.go:26 github.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_CreateExperiment_Handler.func1 bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:569 main.apiServerInterceptor backend/src/apiserver/interceptor.go:30 github.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_CreateExperiment_Handler bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/experi .. Bad request. ... github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrap backend/src/common/util/error.go:211 github.com/kubeflow/pipelines/backend/src/common/util.Wrap backend/src/common/util/error.go:244 github.com/kubeflow/pipelines/backend/src/apiserver/server.isAuthorized backend/src/apiserver/server/util.go:379 .. Failed to authorize with API resource references

All the trials fail, seemingly when the jobs fail creating experiments. It seems the API server requires a user identity to authorize the call but should'nt that be happening automatically somehow?

I also had to add a "pipleine-runner" service account in my namespace, which the trial jobs require, as it wasnt there by default and assign it the proper clusterrole / rolebindings.

I am running kubeflow 1.1 using this manifest file on AKS 1.15.13

I am using the 'gcr.io/arrikto/jupyter-kale:v0.5.0-47-g2427cc9' image to create the notebook server and updating the following kubeflow components after install. Katib controller: gcr.io/arrikto/katib-controller:40b5b51a Katib Chocolate service: gcr.io/arrikto/suggestion-chocolate:40b5b51a

After installing 1.1, I faced an issue described here and applied a workaround, by updating pipeline istio destination rules to HTTP instead of mTLS, also discussed in the same thread. Though, I am not sure if this issue is related with my current problem.

Looking for suggestions.

danishsamad commented 3 years ago

For anyone facing the same issue, I posted this query on the kubeflow kale slack channel and this is the response I got from @elikatsis

"The trial uses the KFP python client to create experiments and runs. The problem here is that upstream KFP (I believe that's the one you have deployed) supports authentication/authorization for its client only from outside the cluster and only on GCP. The reason why the codelab works on MiniKF is that we (Arrikto) have extended the KFP apiserver and the python client accordingly enabling their secure communication. We will be pushing these designs and implementations upstream in the near future."

iptizer commented 3 years ago

@danishsamad Have you heared something in regards to this topic?

We are facing the same issue with 1.2.0 AWS deployment with istio 1.1. The problem seems not be present using the 1.2 deployment with dex.

Maybe this is related to the old istio version used in that manifest?