Closed maanur closed 3 years ago
Thank you for creating this @maanur and tested Katib on OpenShift!
Please can you try to specify kubernetes.io/legacy-unknown
signerName here: https://github.com/kubeflow/katib/blob/master/hack/cert-generator.sh#L82.
Then, build and push your custom image for the cert generator:
docker build -t docker.io/<registry>/cert-generator -f cmd/cert-generator/v1beta1/Dockerfile .
docker push docker.io/<registry>/cert-generator
And use your custom image in the manifest: https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/installs/katib-standalone/kustomization.yaml#L46.
My concern is that for OpenShift we need a different signerName. /cc @tenzen-y
Changed the cert-generator image, reran the job.
[maanur@toolbox katib]$ oc get csr/katib-controller.kubeflow -o jsonpath="{.spec.signerName}"
kubernetes.io/legacy-unknown
The issue reproduces:
2021/04/08 05:33:33 http: TLS handshake error from 10.254.0.1:32810: remote error: tls: bad certificate
2021/04/08 05:33:33 http: TLS handshake error from 10.254.0.1:32812: remote error: tls: bad certificate
2021/04/08 05:33:33 http: TLS handshake error from 10.254.0.1:32814: remote error: tls: bad certificate
2021/04/08 05:33:34 http: TLS handshake error from 10.254.0.1:32816: remote error: tls: bad certificate
As it is mentioned in Kubernetes docs,
Distribution of trust happens out of band for these signers. Any trust outside of those described above are strictly coincidental.
I'll try to write some kustomization overlay for OpenShift to utilize the service serving certificate feature.
I'll try to write some kustomization overlay for OpenShift to utilize the service serving certificate feature.
That would be great. Thank you @maanur! Also, check this PR: https://github.com/kubeflow/katib/pull/1498#issuecomment-815343266, please. We are refactoring Katib manifests.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
/kind bug
What steps did you take and what happened: I installed the latest version of Katib by cloning the repo's
master
tree and runningmake deploy
against aour OpenShift 4.6.21 cluster. Then I applied random-example.yaml. Created experiment remains in Running condition, Trial's pods are not updated with sidecar containers, `deployment/katib-controller' shows logs with following lines:What did you expect to happen: Webhook certificates are valid, Trial's pods are injected with metric-gathering sidecars, Experiment successfully gathers metrics and progresses as it should.
Anything else you would like to add: As a result of
job/katib-cert-generator
WebhookConfiguration's.webhooks[].clientConfig.caBundle
are updated withca.crt
fromkatib-cert-generator-token
secret, assigned for the SAkatib-cert-generator
. According to documentation on CSR, ServiceAccount'sca.crt
are not guaranteed to verify arbitrary client certificates:I fetched
tls.crt
fromsecret/katib-webhook-cert
andca.crt
fromsecret/katib-cert-generator-token-***
, attached to the corresponding SA. Indeed, the pair is not valid:Environment: