stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Build jupyterhub images for use with e.g. spark, to co-ordinate library & package versions #653

Open adwk67 opened 1 month ago

adwk67 commented 1 month ago

In updating one of the jupyterhub demos, it is not easy to find a combination that works. Everything has to match, even up to patch version: the spark version, the python version, and the requirements must be compatible with the host image libraries (e.g. pandas, scikit-learn etc.). Otherwise there are a lot of serial-class-ID errors on de/serialization. A lot of trial-and-error is involved.

The older jupyter/pyspark-notebook:python-3.11 images have been discontinued and they have been replaced with these from quay.io. Testing these illustrates some typical incompatibilites:

quay.io/jupyter/pyspark-notebook:python-3.11.10 (notebook: spark 3.5.3)
quay.io/jupyter/pyspark-notebook:python-3.11.9 (notebook: spark 3.5.2/python 3.11.9, SDP 3.5.2: python 3.11.7)
quay.io/jupyter/pyspark-notebook:python-3.11.8 (notebook: spark 3.5.1/python 3.11.8, SDP 3.5.1: python 3.11.7)
quay.io/jupyter/pyspark-notebook:python-3.11.7 (notebook: spark 3.5.0, but.... etc.etc.)

Proposal

Build our own jupyterhub images, so we can co-ordinate these versions.

Links

https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-pyspark-notebook