jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
541 stars 303 forks source link

kubespawner stopped working after kubernetes upgrade from 1.15 to 1.16 #354

Closed apoliakevitch closed 4 years ago

apoliakevitch commented 4 years ago
[E 2019-09-25 05:16:53.998 JupyterHub web:1788] Uncaught exception GET /hub/user/apoliakevitch/ (10.244.4.1)
    HTTPServerRequest(protocol='http', host='conda.corp-apps.com', method='GET', uri='/hub/user/apoliakevitch/', version='HTTP/1.1', remote_ip='10.244.4.1')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/tornado/web.py", line 1699, in _execute
        result = await result
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/handlers/base.py", line 1013, in get
        raise copy.copy(exc).with_traceback(exc.__traceback__)
      File "/usr/local/lib/python3.6/dist-packages/tornado/gen.py", line 589, in error_callback
        future.result()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/handlers/base.py", line 636, in finish_user_spawn
        await spawn_future
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/user.py", line 489, in spawn
        raise e
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/user.py", line 409, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
      File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 1636, in _start
        events = self.events
      File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 1491, in events
        for event in self.event_reflector.events:
      File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 72, in events
        key=lambda x: x.last_timestamp,
    TypeError: '<' not supported between instances of 'datetime.datetime' and 'NoneType'
dbricare commented 4 years ago

This problem may be a result of the kubernetes python api only supporting up to version 1.14 https://github.com/kubernetes-client/python

AnotherCodeArtist commented 4 years ago

So, closing this issue means closing Kubespawner and therefore Jupyterhub on K8s? Will there never be a solution?

apoliakevitch commented 4 years ago

Unless they re-write it in golang or something supported. The workaround for a particular case was to disable the events but I'm sure in the future something else will get broken.

manics commented 4 years ago

If the problem is due to a breaking change in the K8s API (as opposed to a bug in kubespawner) the fix will either need to wait for the upstream Python client package (there's already an open issue: https://github.com/kubernetes-client/python/issues/973) or the fix will need to use methods which haven't been broken.

If anyone currently running k8s 1.16 has time to investigate that would be very helpful!

apoliakevitch commented 4 years ago

According to the error File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 72, in events key=lambda x: x.last_timestamp, TypeError: '<' not supported between instances of 'datetime.datetime' and 'NoneType'

Could it be that a simple NULL check is missing at line 71?

michitaro commented 4 years ago

As @apoliakevitch says event objects sometimes have null lastTimestamp actually.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T17:01:15Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T16:51:36Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl get events -ojsonpath="{range .items[*]}{.lastTimestamp}{'\t'}{.message}{'\n'}{end}" | expand -t 24
2019-10-04T00:03:52Z    waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
2019-10-04T00:03:52Z    External provisioner is provisioning volume for claim "jhub/claim-test"
<nil>                   pod has unbound immediate PersistentVolumeClaims
<nil>                   pod has unbound immediate PersistentVolumeClaims
<nil>                   Successfully assigned jhub/jupyter-test to niu.mtk.nao.ac.jp
2019-10-04T00:03:57Z    AttachVolume.Attach succeeded for volume "pvc-06076af8-529e-4e0b-a9af-d366c9e99f61"
2019-10-04T00:04:10Z    Container image "jupyterhub/k8s-network-tools:0.8.2" already present on machine
2019-10-04T00:04:10Z    Created container block-cloud-metadata
2019-10-04T00:04:10Z    Started container block-cloud-metadata
2019-10-04T00:04:11Z    Container image "jupyterhub/k8s-singleuser-sample:0.8.2" already present on machine
2019-10-04T00:04:12Z    Created container notebook
2019-10-04T00:04:12Z    Started container notebook
2019-10-04T00:04:15Z    Stopping container notebook
2019-10-04T00:02:34Z    No matching pods found
2019-10-04T00:02:34Z    No matching pods found

So null checking can avoid this problem for the moment.

root@hub-d59bc9d4f-48c5l:/srv/jupyterhub# diff -c a b
*** a   2019-10-04 00:14:01.892597740 +0000
--- b   2019-10-04 00:13:46.252338190 +0000
***************
*** 69,75 ****
      def events(self):
          return sorted(
              self.resources.values(),
!             key=lambda x: x.last_timestamp,
          )

--- 69,75 ----
      def events(self):
          return sorted(
              self.resources.values(),
!             key=lambda x: x.last_timestamp and x.last_timestamp.timestamp() or 0.,
          )
AnotherCodeArtist commented 4 years ago

So null checking can avoid this problem for the moment.

Works for me!!!!

Many thanks

PuckCh commented 4 years ago

So null checking can avoid this problem for the moment.

Same here. Null checking works for me too.

Thanks.

manics commented 4 years ago

Great! Would one of you like to open a pull request?

dmpe commented 4 years ago

/usr/local/lib/python3.6/dist-packages/kubespawner/

Unfortunately also getting this problem. Because neither nano nor vim is installed in the hub, changing it is not simple either. Or am I missing something @michitaro

PS: I use JupyterHub on K8S - official chart

michitaro commented 4 years ago

@dmpe Change deployment hub's .spec.template.spec.containers[0].command to something like this:

kubectl edit deploy -n $NAMESPACE hub
...
      containers:
      - command:
        - bash
        - -c
        - |
          mkdir -p ~/hotfix
          cp -r /usr/local/lib/python3.6/dist-packages/kubespawner ~/hotfix
          ls -R ~/hotfix
          patch ~/hotfix/kubespawner/spawner.py << EOT
          72c72
          <             key=lambda x: x.last_timestamp,
          ---
          >             key=lambda x: x.last_timestamp and x.last_timestamp.timestamp() or 0.,
          EOT

          PYTHONPATH=$HOME/hotfix jupyterhub --config /srv/jupyterhub_config.py --upgrade-db
        env:
        - name: PYTHONUNBUFFERED
          value: "1"
        - name: HELM_RELEASE_NAME
...
michitaro commented 4 years ago

The patch can be done also by the command below.

kubectl patch deploy -n $NAMESPACE hub --type json --patch '[{"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["bash", "-c", "\nmkdir -p ~/hotfix\ncp -r /usr/local/lib/python3.6/dist-packages/kubespawner ~/hotfix\nls -R ~/hotfix\npatch ~/hotfix/kubespawner/spawner.py << EOT\n72c72\n<             key=lambda x: x.last_timestamp,\n---\n>             key=lambda x: x.last_timestamp and x.last_timestamp.timestamp() or 0.,\nEOT\n\nPYTHONPATH=$HOME/hotfix jupyterhub --config /srv/jupyterhub_config.py --upgrade-db\n"]}]'
AnotherCodeArtist commented 4 years ago

You could also use this docker image. It comes with the patch included. Since there's no pull request yet, I'll try to open one.

clkao commented 4 years ago

Do we know if the null timestamp in event is intended in 1.16 or a regression?

GrahamDumpleton commented 4 years ago

For me the proposed change just changes the error:

  File "/opt/app-root/lib/python3.6/site-packages/kubespawner/spawner.py", line 73, in events
    key=lambda x: x.last_timestamp if x.last_timestamp is not None else 0.,
TypeError: '<' not supported between instances of 'float' and 'datetime.datetime'

Now complains about comparing float rather than NoneType.

GrahamDumpleton commented 4 years ago

If is just to avoid error on sort, maybe should use datetime.datetime.fromtimestamp(0) rather than just 0..

mdjaere commented 4 years ago

Can confirm this applies to kubernetes 1.17 as well, using helm chart from https://z2jh.jupyter.org/en/latest/setup-jupyterhub/setup-jupyterhub.html (v0.8.2)

consideRatio commented 4 years ago

It is resolved in the 0.9.0-beta.3 though, right?

monsieurborges commented 4 years ago

In my case (bare metal installation), I fixed the null problem using the @michitaro patch and the following helm instruction:

...
singleuser:
  # Mandatory for Bare Metal installation
  cloudMetadata:
    enabled: true
...

The patch can be done also by the command below.

kubectl patch deploy -n $NAMESPACE hub --type json --patch '[{"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["bash", "-c", "\nmkdir -p ~/hotfix\ncp -r /usr/local/lib/python3.6/dist-packages/kubespawner ~/hotfix\nls -R ~/hotfix\npatch ~/hotfix/kubespawner/spawner.py << EOT\n72c72\n<             key=lambda x: x.last_timestamp,\n---\n>             key=lambda x: x.last_timestamp and x.last_timestamp.timestamp() or 0.,\nEOT\n\nPYTHONPATH=$HOME/hotfix jupyterhub --config /srv/jupyterhub_config.py --upgrade-db\n"]}]'
mdjaere commented 4 years ago

@consideRatio I've now tried 0.9.0-beta.3 and it works fine! Thanks

fhaase2 commented 4 years ago

I can confirm that this works in version 0.9.0-beta.3