jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.54k stars 388 forks source link

Compatibility with Kubernetes 1.18 on OVH #1116

Open jtpio opened 4 years ago

jtpio commented 4 years ago

Bug description

Hey folks,

Creating a new issue from several comments in https://github.com/jupyterhub/binderhub/issues/810#issuecomment-510020306 for better visibility.

This is a summary of the issues encountered with a fresh BinderHub install on a new Kubernetes 1.18 cluster on OVH.

Expected behaviour

BinderHub should be able to build binders out of the box after following the instructions from the Zero to BinderHub guide.

Actual behaviour

From this comment (July 2019): https://github.com/jupyterhub/binderhub/issues/810#issuecomment-510020306

Note: it's possible that the Kubernetes version used at that time back in July 2019 was not 1.18, but maybe 1.17 (can't remember).

The binder pod uses the wrong Kubernetes namespace. From the pod logs:

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:binderhub:binderhub\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

The release is deployed in the binderhub namespace, but binder wants to list the pods in the default namespace, even though BUILD_NAMESPACE is correctly set to binderhub (the namespace where the chart is released):

$ kubectl exec -it binder-d998c657c-zmdf8 -- env | grep BUILD_NAMESPACE
BUILD_NAMESPACE=binderhub
root@binder-d998c657c-zmdf8:/# tr '\0' '\n' < /proc/1/environ | grep BUILD_NAMESPACE
BUILD_NAMESPACE=binderhub

The logs for the binder pod:

[E 190710 10:48:12 app:638] Failed to cleanup build pods
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/binderhub/app.py", line 630, in watch_build_pods
        lambda: Build.cleanup_builds(
      File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.6/site-packages/binderhub/app.py", line 633, in <lambda>
        self.build_max_age,
      File "/usr/local/lib/python3.6/site-packages/binderhub/build.py", line 91, in cleanup_builds
        label_selector='component=binderhub-build',
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12310, in list_namespaced_pod
        (data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12413, in list_namespaced_pod_with_http_info
        collection_formats=collection_formats)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api
        _return_http_data_only, collection_formats, _preload_content, _request_timeout)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
        _request_timeout=_request_timeout)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 342, in request
        headers=headers)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in GET
        query_params=query_params)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
        raise ApiException(http_resp=r)
    kubernetes.client.rest.ApiException: (403)
    Reason: Forbidden
    HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Wed, 10 Jul 2019 10:48:12 GMT', 'Content-Length': '286'})
    HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:binderhub:binderhub\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

Looking at the code, it looks like build_namespace should be correctly passed:

https://github.com/jupyterhub/binderhub/blob/01b1c59b9e7dc81250c1ed579c492ec2fd6baaf6/binderhub/app.py#L630-L634

The issue mentioned above seems to correspond to the step that comes after the binderhub serviceaccount has the correct rights on the default namespace.


From this comment (June 2020): https://github.com/jupyterhub/binderhub/issues/810#issuecomment-651089134

Trying a fresh BinderHub deployment on a new Kubernetes cluster.

Setting:

config:
  BinderHub:
    auth_enabled: false

helps get past the following error in /binderhub_config.py:

Loading /etc/binderhub/config/values.yaml
Loading /etc/binderhub/secret/values.yaml
[BinderHub] ERROR | Exception while loading config file /binderhub_config.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/traitlets/config/application.py", line 563, in _load_config_files
    config = loader.load_config()
  File "/usr/local/lib/python3.7/site-packages/traitlets/config/loader.py", line 457, in load_config
    self._read_file_as_dict()
  File "/usr/local/lib/python3.7/site-packages/traitlets/config/loader.py", line 489, in _read_file_as_dict
    py3compat.execfile(conf_filename, namespace)
  File "/usr/local/lib/python3.7/site-packages/ipython_genutils/py3compat.py", line 198, in execfile
    exec(compiler(f.read(), fname, 'exec'), glob, loc)
  File "/binderhub_config.py", line 87, in <module>
    hub_url = urlparse(c.BinderHub.hub_url)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 367, in urlparse
    url, scheme, _coerce_result = _coerce_args(url, scheme)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 123, in _coerce_args
    return _decode_args(args) + (_encode_result,)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 107, in _decode_args
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 107, in <genexpr>
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
AttributeError: 'LazyConfigValue' object has no attribute 'decode'

Setting:

extraConfig:
  01-custom: |
      c.BinderHub.build_namespace = "binderhub"

helps prevent Binder trying to list pods in the default namespace.

However permissions still seem to be off even though the namespace looks good this time:

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:binderhub:binderhub\" cannot list resource \"pods\" in API group \"\" in the namespace \"binderhub\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

The binderhub Role and RoleBinding also look good and correspond to the defaults from the chart:

https://github.com/jupyterhub/binderhub/blob/58a0b72021d17264519438f6e06f452021617a35/helm-chart/binderhub/templates/rbac.yaml#L2-L34

Creating a new Role and RoleBinding with all the permissions on the binderhub namespace for the binderhub ServiceAccount doesn't change anything.

How to reproduce

It is unclear whether this is an issue with Kubernetes 1.18, or with the cloud vendor. GKE doesn't seem to offer 1.18 yet and can't be used as a comparison.

Could it be related to a recent change to rbac in recent Kubernetes versions?

Your personal set up

jtpio commented 4 years ago

If some folks have experienced the same issue on other cloud vendors or on-premise clusters, please feel free to add more details.

Also maybe the folks managing the OVH cluster of the mybinder.org federation would have some input on this?

Thanks!

SylvainCorlay commented 4 years ago

@jagwar just pointed that they found the same issue when deploying the main mybinder instance on OVH.

https://github.com/jupyterhub/mybinder.org-deploy/issues/1445

jtpio commented 4 years ago

Is it the same issue? It looks like https://github.com/jupyterhub/mybinder.org-deploy/issues/1445 is more about ingress issues.

However the following comment sounds relevant:

Problem -> I went a little bit to far and I realize that mybinder.org-deploy is only compatible with k8s <= 1.15

bitnik commented 4 years ago

GESIS cluster is created on baremetal with kubeadm 1.18.3 and our Binder is deployed on namespace different than default but we dont experience this issue. Btw helm version is v3.2.2.


There is one thing in your post that confuses me:

$ kubectl exec -it binder-d998c657c-zmdf8 -- env | grep BUILD_NAMESPACE BUILD_NAMESPACE=binderhub

this command should return "binder-d998c657c-zmdf8 not found" error, shouldn't it? because you don't pass the namepspace info to kubectl (-n binderhub). Maybe it was just a typo? or sth with namespaces goes really weird.

Have you tried to deploy BinderHub on the default namespace on OVH with kubernetes 1.18?

jtpio commented 4 years ago

Thanks @bitnik for the input.

Btw helm version is v3.2.2

Using 2.16.9 here.

$ helm version
Client: &version.Version{SemVer:"v2.16.9", GitCommit:"8ad7037828e5a0fca1009dabe290130da6368e39", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.9", GitCommit:"8ad7037828e5a0fca1009dabe290130da6368e39", GitTreeState:"clean"}

this command should return "binder-d998c657c-zmdf8 not found" error, shouldn't it? because you don't pass the namepspace info to kubectl (-n binderhub). Maybe it was just a typo? or sth with namespaces goes really weird.

This command was from the previous try a year ago, but it must have been run after switching to the binderhub namespace with:

kubectl config set-context --current --namespace=binderhub

Have you tried to deploy BinderHub on the default namespace on OVH with kubernetes 1.18?

Not for the last test, but for the first one I think so.

Also everything works fine with Kubernetes 1.15 and by following the steps mentioned in the docs.

apiloqbc commented 4 years ago

I get the same error:
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:binderhub:binderhub\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

helm version v3.2.1 kubernetes v1.15.11

jtpio commented 4 years ago

@apiloqbc which cloud vendor are you using? or is it bare metal?

apiloqbc commented 4 years ago

@apiloqbc which cloud vendor are you using? or is it bare metal?

Amazon EKS

xubofei1983 commented 4 years ago

same here for EKS 1.16, I see the env variable BUILD_NAMESPACE is set correctly in binder pod.

apiloqbc commented 3 years ago

I finally figured out what the problem is. Is missing the property "hub_url":

config:
  BinderHub:
    hub_url: "http://proxy-public:8000"
    ...

so the binderhub_config.py file is ignored -> BUILD_NAMESPACE too

after all, even the logs said that:

Loading /etc/binderhub/config/values.yaml
Loading /etc/binderhub/secret/values.yaml
[BinderHub] ERROR | Exception while loading config file /binderhub_config.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/traitlets/config/application.py", line 563, in _load_config_files
    config = loader.load_config()
  File "/usr/local/lib/python3.7/site-packages/traitlets/config/loader.py", line 457, in load_config
    self._read_file_as_dict()
  File "/usr/local/lib/python3.7/site-packages/traitlets/config/loader.py", line 489, in _read_file_as_dict
    py3compat.execfile(conf_filename, namespace)
  File "/usr/local/lib/python3.7/site-packages/ipython_genutils/py3compat.py", line 198, in execfile
    exec(compiler(f.read(), fname, 'exec'), glob, loc)
  File "/binderhub_config.py", line 87, in <module>
    hub_url = urlparse(c.BinderHub.hub_url)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 367, in urlparse
    url, scheme, _coerce_result = _coerce_args(url, scheme)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 123, in _coerce_args
    return _decode_args(args) + (_encode_result,)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 107, in _decode_args
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 107, in <genexpr>
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
AttributeError: 'LazyConfigValue' object has no attribute 'decode'
manics commented 3 years ago

What's the current status of this?