kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
306 stars 143 forks source link

can I use PyTorchJobClient inside a pod of the cluster? #330

Open omlomloml opened 3 years ago

omlomloml commented 3 years ago

I get 403, if I can use this way, how should I setup the config file?

Thanks

ptc.get(namespace='kubeflow') Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/kubeflow/pytorchjob/api/py_torch_job_client.py", line 134, in get pytorchjob = thread.get(constants.APISERVER_TIMEOUT) File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 176, in __call_api _request_timeout=_request_timeout) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 366, in request headers=headers) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 241, in GET query_params=query_params) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in request raise ApiException(http_resp=r) kubernetes.client.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 09 Apr 2021 15:24:19 GMT', 'Content-Length': '350'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pytorchjobs.kubeflow.org is forbidden: User \"system:serviceaccount:metis:default\" cannot list resource \"pytorchjobs\" in API group \"kubeflow.org\" in the namespace \"kubeflow\"","reason":"Forbidden","details":{"group":"kubeflow.org","kind":"pytorchjobs"},"code":403}

Shuai-Xie commented 3 years ago

You can use PyTorchJobClient in a Pod.

But a proper ClusterRoleBinding should be configured for the ServiceAccounts at first.

For example, you can apply the pytorchjobs_access_rbac.yaml below to get all the access to the PytorchJob resources in a pod behind the default ServiceAccount of the default namespace.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pytorchjobs-runner-role
rules:
- apiGroups: ["kubeflow.org"]
  resources: ["pytorchjobs"]
  verbs: ["*"]    # get all the access to PytorchJob resources

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pytorchjobs-runner-role-bind
subjects:
- kind: ServiceAccount   # default service account can use
  name: default
  namespace: default
roleRef:
  kind: ClusterRole
  name: automl-role
  apiGroup: rbac.authorization.k8s.io
kubectl apply -f pytorchjobs_access_rbac.yaml

Best Regards.