kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.51k stars 443 forks source link

Run ReadME.md example, Bug Get FileNotFoundError: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/namespace' #2436

Closed Yumeka999 closed 1 month ago

Yumeka999 commented 1 month ago

What happened?

when i use this code to test katib (ReadME.md example)

import kubeflow.katib as katib

# Step 1. Create an objective function.
def objective(parameters):
    # Import required packages.
    import time
    time.sleep(5)
    # Calculate objective function.
    result = 4 * int(parameters["a"]) - float(parameters["b"]) ** 2
    # Katib parses metrics in this format: <metric-name>=<metric-value>.
    print(f"result={result}")

# Step 2. Create HyperParameter search space.
parameters = {
    "a": katib.search.int(min=10, max=20),
    "b": katib.search.double(min=0.1, max=0.2)
}

# Step 3. Create Katib Experiment.
katib_client = katib.KatibClient()
name = "tune-experiment"
katib_client.tune(
    name=name,
    objective=objective,
    parameters=parameters,
    objective_metric_name="result",
    max_trial_count=12
)

# Step 4. Get the best HyperParameters.
print(katib_client.get_optimal_hyperparameters(name))

What did you expect to happen?

python run_katib.py

I get this error:

Traceback (most recent call last):
  File "run_katib.py", line 1, in <module>
    import kubeflow.katib as katib
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/__init__.py", line 73, in <module>
    from kubeflow.katib.api.katib_client import KatibClient
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/api/katib_client.py", line 30, in <module>
    class KatibClient(object):
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/api/katib_client.py", line 36, in KatibClient
    namespace: str = utils.get_default_target_namespace(),
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/utils/utils.py", line 37, in get_default_target_namespace
    return get_current_k8s_namespace()
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/utils/utils.py", line 30, in get_current_k8s_namespace
    with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace", "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/namespace'

Environment

Kubernetes version:

WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.12", GitCommit:"ef70d260f3d036fc22b30538576bbf6b36329995", GitTreeState:"clean", BuildDate:"2023-03-15T13:37:18Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.12", GitCommit:"ef70d260f3d036fc22b30538576bbf6b36329995", GitTreeState:"clean", BuildDate:"2023-03-15T13:30:13Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

Katib controller version:

harbor.xnunion.com/kubeflow/kubeflowkatib/katib-controller:v0.17.0(

Katib Python SDK version:

Name: kubeflow-katib
Version: 0.17.0
Summary: Katib Python SDK for APIVersion v1beta1
Home-page: https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1
Author: Kubeflow Authors
Author-email: premnath.vel@gmail.com
License: Apache License Version 2.0
Location: /root/.local/lib/python3.8/site-packages
Requires: certifi, grpcio, kubernetes, protobuf, setuptools, six, urllib3
Required-by:

Impacted by this bug?

Give it a πŸ‘ We prioritize the issues with most πŸ‘

andreyvelich commented 1 month ago

Thank you for creating this @Yumeka999! From where did you run the SDK ? Also, can you check this directory?

ls -la /var/run/secrets/kubernetes.io/

/area sdk /remove-label lifecycle/needs-triage

Yumeka999 commented 1 month ago

I run python script code in physical machine which install k8s

ls -la /var/run/secrets/kubernetes.io/

drwxr-xr-x 3 root root 60 9月   4 14:36 .
drwxr-xr-x 3 root root 60 9月   4 14:36 ..
drwxr-xr-x 2 root root 40 10月  1 20:25 serviceaccount

and ls -la /var/run/secrets/kubernetes.io/serviceaccount

drwxr-xr-x 2 root root 40 10月  1 20:25 .
drwxr-xr-x 3 root root 60 9月   4 14:36 ..
Yumeka999 commented 1 month ago

Thank you for creating this @Yumeka999! From where did you run the SDK ? Also, can you check this directory?

ls -la /var/run/secrets/kubernetes.io/

/area sdk /remove-label lifecycle/needs-triage

here is result

andreyvelich commented 1 month ago

Usually, this folder should indicate the namespace where you run your Pod's container: /var/run/secrets/kubernetes.io/serviceaccount/namespace But, since you run this script from local machine, this directly should not exist.

Do you know how did you create the /var/run/secrets/kubernetes.io/ directory ?

Yumeka999 commented 1 month ago

Usually, this folder should indicate the namespace where you run your Pod's container: /var/run/secrets/kubernetes.io/serviceaccount/namespace But, since you run this script from local machine, this directly should not exist.

Do you know how did you create the /var/run/secrets/kubernetes.io/ directory ?

The Dir /var/run/secrets/kubernetes.io/ has been exsisted and I don't know the directory how to be cretead

Is the python script in Quickstart could be run in local machine? Or Is the python script in Quickstart should be run in pod ?

tenzen-y commented 1 month ago

I guess that the root cause is https://github.com/kubeflow/katib/blob/867c40a1b0669446c774cd6e770a5b7bbf1eb2f1/sdk/python/v1beta1/kubeflow/katib/utils/utils.py#L29-L30.

Even though Katib SDK recognizes based on "/var/run/secrets/kubernetes.io/" that SDK is performed in Pod, your local (not in Pod) has the directory.

andreyvelich commented 1 month ago

That's right, and you can execute the above code from local machine and from the pod. We are just using different mechanism to get current namespace:

Yumeka999 commented 1 month ago

I guess that the root cause is

https://github.com/kubeflow/katib/blob/867c40a1b0669446c774cd6e770a5b7bbf1eb2f1/sdk/python/v1beta1/kubeflow/katib/utils/utils.py#L29-L30

. Even though Katib SDK recognizes based on "/var/run/secrets/kubernetes.io/" that SDK is performed in Pod, your local (not in Pod) has the directory.

if i delete the dir "/var/run/secrets/kubernetes.io/" and run the python code again, i try

Yumeka999 commented 1 month ago

when i delete the dir "/var/run/secrets/kubernetes.io/" and run the python code again

There is the new exception , it's

Traceback (most recent call last):
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/api/katib_client.py", line 91, in create_experiment
    self.custom_api.create_namespaced_custom_object(
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 225, in create_namespaced_custom_object
    return self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs)  # noqa: E501
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 344, in create_namespaced_custom_object_with_http_info
    return self.api_client.call_api(
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 275, in POST
    return self.request("POST", url,
  File "/root/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 234, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '8f8ede55-980b-4ea8-9a17-4a7f1a1e377c', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '7e96f0b0-f9b8-4ece-89db-ae85dc2e5bb9', 'X-Kubernetes-Pf-Prioritylevel-Uid': '97abe02e-d257-421c-89b7-0fba6242fd4f', 'Date': 'Wed, 09 Oct 2024 03:29:23 GMT', 'Content-Length': '335'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"admission webhook \"validator.experiment.katib.kubeflow.org\" denied the request: Cannot create the Experiment \"tune-experiment\" in namespace \"default\": the namespace lacks label \"katib.kubeflow.org/metrics-collector-injection: enabled\"","code":400}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_katib.py", line 22, in <module>
    katib_client.tune(
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/api/katib_client.py", line 314, in tune
    self.create_experiment(experiment, namespace)
  File "/root/.local/lib/python3.8/site-packages/kubeflow/katib/api/katib_client.py", line 103, in create_experiment
    raise RuntimeError(
RuntimeError: Failed to create Katib Experiment: default/tune-experiment
Yumeka999 commented 1 month ago

Now my env:

Kubernetes version:

WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.12", GitCommit:"ef70d260f3d036fc22b30538576bbf6b36329995", GitTreeState:"clean", BuildDate:"2023-03-15T13:37:18Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.12", GitCommit:"ef70d260f3d036fc22b30538576bbf6b36329995", GitTreeState:"clean", BuildDate:"2023-03-15T13:30:13Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

Katib controller version:

harbor.xnunion.com/kubeflow/kubeflowkatib/katib-controller:v0.15.0

Katib Python SDK version:

Name: kubeflow-katib
Version: 0.15.0
Summary: Katib Python SDK for APIVersion v1beta1
Home-page: https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1
Author: Kubeflow Authors
Author-email: premnath.vel@gmail.com
License: Apache License Version 2.0
Location: /root/.local/lib/python3.8/site-packages
Requires: certifi, grpcio, kubernetes, protobuf, setuptools, six, urllib3
Required-by:

Kubernetes Python SDK version:

Name: kubernetes
Version: 23.6.0
Summary: Kubernetes python client
Home-page: https://github.com/kubernetes-client/python
Author: Kubernetes
Author-email:
License: Apache License Version 2.0
Location: /root/.local/lib/python3.8/site-packages
Requires: certifi, google-auth, python-dateutil, pyyaml, requests, requests-oauthlib, setuptools, six, urllib3, websocket-client
Required-by: kubeflow-katib
andreyvelich commented 1 month ago

RuntimeError: Failed to create Katib Experiment: default/tune-experiment

@Yumeka999 Please use the kubeflow namespace in your Katib Client as described in this getting started example: https://www.kubeflow.org/docs/components/katib/getting-started/#getting-started-with-katib-python-sdk. Since the namespace where you create Katib Experiments must have this label: katib.kubeflow.org/metrics-collector-injection: enabled. I will update the README instructions.

Yumeka999 commented 1 month ago

RuntimeError: Failed to create Katib Experiment: default/tune-experiment

@Yumeka999 Please use the kubeflow namespace in your Katib Client as described in this getting started example: https://www.kubeflow.org/docs/components/katib/getting-started/#getting-started-with-katib-python-sdk. Since the namespace where you create Katib Experiments must have this label: katib.kubeflow.org/metrics-collector-injection: enabled. I will update the README instructions.

It's error in kubeflow-katib==0.15.0, I find there not exists paramter namespace in the init() method ofclass KatibClient

andreyvelich commented 1 month ago

It's error in kubeflow-katib==0.15.0, I find there not exists paramter namespace in the init() method ofclass KatibClient

@Yumeka999 Please can you try to use Katib 0.17 and try this example: https://www.kubeflow.org/docs/components/katib/getting-started/#getting-started-with-katib-python-sdk

Yumeka999 commented 1 month ago

It's error in kubeflow-katib==0.15.0, I find there not exists paramter namespace in the init() method ofclass KatibClient

@Yumeka999 Please can you try to use Katib 0.17 and try this example: https://www.kubeflow.org/docs/components/katib/getting-started/#getting-started-with-katib-python-sdk

OK, Thank you