In updating one of the jupyterhub demos, it is not easy to find a combination that works. Everything has to match, even up to patch version: the spark version, the python version, and the requirements must be compatible with the host image libraries (e.g. pandas, scikit-learn etc.). Otherwise there are a lot of serial-class-ID errors on de/serialization. A lot of trial-and-error is involved.
The older jupyter/pyspark-notebook:python-3.11 images have been discontinued and they have been replaced with these from quay.io. Testing these illustrates some typical incompatibilites:
In updating one of the jupyterhub demos, it is not easy to find a combination that works. Everything has to match, even up to patch version: the spark version, the python version, and the requirements must be compatible with the host image libraries (e.g. pandas, scikit-learn etc.). Otherwise there are a lot of serial-class-ID errors on de/serialization. A lot of trial-and-error is involved.
The older
jupyter/pyspark-notebook:python-3.11
images have been discontinued and they have been replaced with these from quay.io. Testing these illustrates some typical incompatibilites:Proposal
Build our own jupyterhub images, so we can co-ordinate these versions.
Links
https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-pyspark-notebook