kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.51k stars 443 forks source link

User "system:serviceaccount:kubeflow:default" cannot create resource "experiments" in API group "kubeflow.org" in the namespace "kubeflow" #2447

Open Yumeka999 opened 1 month ago

Yumeka999 commented 1 month ago

What happened?

when i run torch mnist code with katib in pod which in kubeflow namespace.

Traceback (most recent call last):
  File "/root/.local/lib/python3.8/site-packages/katib/api/katib_client.py", line 111, in create_experiment
    outputs = self.custom_api.create_namespaced_custom_object(
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 231, in create_namespaced_custom_object
    return self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs)  # noqa: E501
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 354, in create_namespaced_custom_object_with_http_info
    return self.api_client.call_api(
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 279, in POST
    return self.request("POST", url,
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'a59b1c51-9f7c-43ec-a427-c54a5f89a95e', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '4ba04eae-5588-448a-92bf-2ec54e588ed0', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'f1828aa4-7580-417f-a239-3e4a39493f4a', 'Date': 'Wed, 23 Oct 2024 05:29:31 GMT', 'Content-Length': '355'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"experiments.kubeflow.org is forbidden: User \"system:serviceaccount:kubeflow:default\" cannot create resource \"experiments\" in API group \"kubeflow.org\" in the namespace \"kubeflow\"","reason":"Forbidden","details":{"group":"kubeflow.org","kind":"experiments"},"code":403}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "_test_katib_office._test_.py", line 16, in <module>
    katib_client.tune(
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/api/katib_client.py", line 424, in tune
    self.create_experiment(experiment, namespace)
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/api/katib_client.py", line 130, in create_experiment
    raise RuntimeError(
RuntimeError: Failed to create Katib Experiment: kubeflow/tune-experiment

What did you expect to happen?

run code and get normal status

Environment

Kubernetes version:

$ kubectl version
Client Version: v1.29.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.2

Katib controller version:

$ kubectl get pods -n kubeflow -l katib.kubeflow.org/component=controller -o jsonpath="{.items[*].spec.containers[*].image}"
kubeflow/kubeflowkatib/katib-controller:v0.15.0

Katib Python SDK version:

$ pip show kubeflow-katib
Name: kubeflow-katib
Version: 0.17.0

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

tenzen-y commented 1 month ago

Could you check if you have appropriate permissions to operate Experiment?

kubectl auth can-i create experiments.kubeflow.org
Yumeka999 commented 4 weeks ago

The code run in k8 pod, and kubectl is not installed in container image, how do i find another method to check auth