kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.51k stars 660 forks source link

TfJob creation failed due to webhook validation failure #2143

Open nagar-ajay opened 3 weeks ago

nagar-ajay commented 3 weeks ago

What happened?

❯ k get ValidatingWebhookConfiguration validator.tfjob.training-operator.kubeflow.org -o yaml
Error from server (NotFound): validatingwebhookconfigurations.admissionregistration.k8s.io "validator.tfjob.training-operator.kubeflow.org" not found

Not sure but there might be some flakiness in the training-operator kubeflow mode installation.

What did you expect to happen?

Environment