jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.57k stars 390 forks source link

jupyterhub pod fails to init after upgrade #1436

Closed ltetrel closed 2 years ago

ltetrel commented 2 years ago

Bug description

hub pod fails to initialize after upgrade

How to reproduce

sudo helm upgrade binderhub jupyterhub/binderhub --version=v0.2.0-n852.h7c39292 -f config.yaml -f secrets.yaml -n binderhub

Your personal set up

paste relevant logs here, if any

NAME                                             READY   STATUS             RESTARTS   AGE  NODE                     START
binder-68ff98c78d-svr4z                          1/1     Running            0          2m28s    neurolibre-test-node1    2021-11-22T17:24:23Z
binderhub-image-cleaner-5srbp                    1/1     Running            0          26d  neurolibre-test-node1    2021-10-27T15:14:53Z
binderhub-image-cleaner-t9h56                    1/1     Running            0          26d  neurolibre-test-master   2021-10-27T15:14:53Z
binderhub-proxy-ingress-nginx-controller-8k6jf   1/1     Running            0          26d  neurolibre-test-master   2021-10-26T21:25:30Z
binderhub-proxy-ingress-nginx-controller-q5cjr   1/1     Running            0          26d  neurolibre-test-node1    2021-10-26T21:25:30Z
continuous-image-puller-tgvvt                    1/1     Running            0          12d  neurolibre-test-master   2021-11-10T16:47:25Z
continuous-image-puller-w5hjp                    1/1     Running            0          12d  neurolibre-test-node1    2021-11-10T16:47:24Z
hub-5d77dd5cf7-qjbpr                             0/1     CrashLoopBackOff   4          2m17s    neurolibre-test-master   2021-11-22T17:24:34Z
Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading extra config: 0-binderspawnermixin
Loading extra config: 00-binder
[I 2021-11-22 17:28:08.272 JupyterHub app:2459] Running JupyterHub version 1.4.2
[I 2021-11-22 17:28:08.272 JupyterHub app:2489] Using Authenticator: nullauthenticator.NullAuthenticator-1.0.0
[I 2021-11-22 17:28:08.272 JupyterHub app:2489] Using Spawner: builtins.BinderSpawner
[I 2021-11-22 17:28:08.272 JupyterHub app:2489] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-1.4.2
[I 2021-11-22 17:28:08.286 JupyterHub dbutil:130] Upgrading sqlite:///jupyterhub.sqlite
[I 2021-11-22 17:28:08.286 JupyterHub dbutil:99] Backing up jupyterhub.sqlite => jupyterhub.sqlite.2021-11-22-172808
[I 2021-11-22 17:28:08.680 alembic.runtime.migration migration:164] Context impl SQLiteImpl.
[I 2021-11-22 17:28:08.680 alembic.runtime.migration migration:167] Will assume non-transactional DDL.
[E 2021-11-22 17:28:08.684 alembic.util.messaging messaging:60] Can't locate revision identified by '833da8570507'
FAILED: Can't locate revision identified by '833da8570507'
[E 2021-11-22 17:28:08.738 JupyterHub app:2969]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2966, in launch_instance_async
        await self.initialize(argv)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2501, in initialize
        self.init_db()
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 1703, in init_db
        dbutil.upgrade_if_needed(self.db_url, log=self.log)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/dbutil.py", line 135, in upgrade_if_needed
        upgrade(db_url)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/dbutil.py", line 84, in upgrade
        check_call(['alembic', '-c', alembic_ini, 'upgrade', revision])
      File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['alembic', '-c', '/tmp/tmpyfkmw4bu/alembic.ini', 'upgrade', 'head']' returned non-zero exit status 255.
ltetrel commented 2 years ago

It works if I use an (older, not sure what is the version name here I am lost) version of jupyterhub:

image:
      name: jupyterhub/k8s-hub
      tag: 1.1.3-n141.h28efde1b

If i remove this and let bindehrub choose jupyterhub it fails as mentionned in the issue.

consideRatio commented 2 years ago

Yeah if you go too far in the development versions beyond 1.1.3, you will start using JupyterHub 2.0.0 and such which is not yet released fully, nor tested for use with BinderHub yet.

I'm not sure what this is about yet, but I'm not so happy to dig into this at this point in time. Hmmm, thinking about it, when you change the hub pod's image without aligning it with the JupyterHub Helm chart - all bets are off.

If you can reproduce this by downloading the BinderHub Helm chart locally to your computer, and deploying it with a modern version of the JupyterHub Helm chart, and reproduce the failure - then I'd be willing to look into this a bit. But if you are just switching the Hub image without the JupyterHub Helm chart that the BinderHub Helm chart depends on, I think you are likely to have issues solely because of that and I don't want to chase such issues.

ltetrel commented 2 years ago

@consideRatio thanks for the feedback,

I think this is an important issue since the latest binderhub release cannot be installed as is (without fixing the jupyterhub version as I did). Unfortunately I cannot reproduce that locally, I have this kubernetes cluster and I don't want to mess up with my configuration there.

ltetrel commented 2 years ago

I was also able to upgrade with jupyterhub 1.1.3-n195.h8ec28343 (2.0.0rc4)

What binderhub version do you precognise then to have a stable (tested) release, and that works (without specific configuration) with jupyterhub ?

Thanks,

consideRatio commented 2 years ago

Are you saying that you are using BinderHub the Helm chart, using the JupyterHub Helm chart of version 1.1.3-n195.h8ec28343, or that you are using the jupyterhub.hub.image versioned 1.1.3-n195.h8ec28343?

I don't recognize any version of BinderHub to be stable/tested more than the one used by jupyterhub/mybinder.org-deploy, as I don't know of any other reference to indicate stability and since we don't have a proper versioning system of the BinderHub software and Helm chart yet.

If you have resolved this issue now, but have an issue about another version compatibility that could be relevant to have documented so that that developers can be made aware of it. You are pushing the boundaries by using a more modern version of the JupyterHub image and/or JupyterHub helm chart than is shipped with the BinderHub Helm chart, and you can run into issues developers doesn't yet know about.

Please open a new issue if this issue is no longer representing your current issue that you discuss, and please be very very clear about if you are using a modified bunderhub helm chart that runs another JupyterHub Helm chart, or if you are just modifying the hub image - if it is the latter, I'd ask that you don't burden us with a question/issue, as the issue is likely related to that which is a wontfix kind of issue.

ltetrel commented 2 years ago

Are you saying that you are using BinderHub the Helm chart, using the JupyterHub Helm chart of version 1.1.3-n195.h8ec28343, or that you are using the jupyterhub.hub.image versioned 1.1.3-n195.h8ec28343?

The jupyterhub.hub.image Also here is the binderhub helm release I am using: v0.2.0-n845.hcc57b24

jupyterhub pod correctly initialize with the following configuration:

jupyterhub:
  hub:
    image:
      name: jupyterhub/k8s-hub
      tag: 1.1.3-n195.h8ec28343

jupyterhub pod does not correctly initialize if image field is not defined after installing with helm. I imagine this is the default behaviour, as this field is not mentioned in the doc, so users may hit this error.

You are pushing the boundaries by using a more modern version of the JupyterHub image and/or JupyterHub helm chart than is shipped with the BinderHub Helm chart

I totally understand, that is why I asked which "stable" version do you precognize for binderhub. Keeping up-to date with jupyterhub/mybinder.org-deploy is a good idea so I will do that.

minrk commented 2 years ago

I think this is an important issue since the latest binderhub release cannot be installed as is

I don't think this is true, because the issue requires a past upgrade to jupyterhub 2.0 and then downgrade, which no version of the binderhub chart should do without explicit user configuration to request a non-default version and then roll it back.

Can I ask how/why did you upgrade the jupyterhub chart and/or image to 2.0?

The short description of what I believe happened is: jupyterhub was upgraded to 2.0, which upgraded the database schema, then downgraded back to the default 1.4.2 (chart 1.1.2) which doesn't know how to read from the 2.0 schema. Either scrapping the database file (usually fine for a binderhub deployment, but not always) or restoring from the backup created during the upgrade should get things back running. I don't think this is a bug in binderhub and/or jupyterhub because no version of the binderhub chart should have caused the upgrade to 2.0.

Some details:

I don't believe this is an issue with the current version of the binderhub and/or jupyterhub charts. Purely a lack of support for downgrading database schemas from future versions of jupyterhub.

There's potentially a feature request here for z2jh (and/or jupyterhub itself) to handle rolling back database upgrades. It's not something we handle currently. We produce the backup files during upgrade, but we don't provide any mechanism to restore one of the backups, which is what's needed after downgrading.

ltetrel commented 2 years ago

Ok @minrk thanks for the explanation.

The short description of what I believe happened is: jupyterhub was upgraded to 2.0, which upgraded the database schema, then downgraded back to the default 1.4.2 (chart 1.1.2) which doesn't know how to read from the 2.0 schema

Regarding what you are saying this is indeed more a backward compatibility issue with the jupyterhub helm chart, and seems like a specific issue with my cluster.

For context: What happened is that for a while I was using a cutsom image with jupyterhub helm chart (1.13~) without knowing actually that is was jupyterhub >= 2.0 (I was confused by the version naming with helm VS app). Then I stopped using that image (I found parameters that allowed me to remove this custom image), and this is when I started having some issues (what I posted at the beginning of this thread).

I will close this since the title does not apply (should be "jupyterhub pod fails to init after downgrade") and not related to binderhub. Still if you have some instructions on how I can revert to "old" (<2.0) jupyter config let me know.

minrk commented 2 years ago

if you have some instructions on how I can revert to "old" (<2.0) jupyter config let me know.

That would be copying the backup file created when you first upgraded to 2.0, as I tried to describe:

"restoring" means copying a file from jupyterhub.sqlite.YYYY-MM-DD-TTTTTT to jupyterhub.sqlite in the hub-db volume.

If you are using this for binderhub, however, nothing in the database is usually valid for more than a day, so just deleting or renaming jupyterhub.sqlite and then restarting the hub pod should probably do it. Doing that would orphan any running pods, but so would restoring the database from an older state.

ltetrel commented 2 years ago

I will try this thanks !

minrk commented 2 years ago

I perhaps conspicuously omitted exactly how to run a simple command on the files in a volume, which might be tricky when the hub pod won't start, depending on your volume provider. I'm sure there are tricks somewhere.

ltetrel commented 2 years ago

In my case I have a persistent volume k8s request for the jupyterhub database. So I can modify files on my host:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: hub-db-dir
  labels:
    type: local
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/shared"