open-services-group / byon

Bring Your Own Notebook (BYON) project repository.
GNU General Public License v3.0
4 stars 8 forks source link

JupyterHub notebook requirements #26

Closed tumido closed 2 years ago

tumido commented 2 years ago

Supporting JupyterHub Spawner UI scenario and the current RHODS/ODH architecture:

Based on s2i-minimal-notebook, each image should include following prerequisites:

Python packages

Based on requirements.in and Pipfile at: https://github.com/thoth-station/s2i-minimal-notebook/tree/master/overlays/f34-python39 https://github.com/thoth-station/s2i-minimal-notebook/tree/master/overlays/python36 https://github.com/thoth-station/s2i-minimal-notebook/tree/master/overlays/python38

Package name Note
notebook Provides the classic notebook interface. Either this or jupyterlab has to be present
jupyterhub Provides Spawner. Also transitive dependency from jupyterhub-singleuser-profiles which is referenced later but not installed as a package (weird)
jupyterlab Provides the JupyterLab UI interface. Either this or jupyterhub has to be present
jupyterlab-requirements Requires jupyterlab, it is NOT present in Python 3.6 overlay
jupyter_kernel_gateway This was probably never used.
jupyter-nbrequirements ~Extension to JupyterLab. Requires jupyterlab.~ Requirements plugin for Jupyterhub classic. Requires notebook. Not maintained.
supervisor This was probably used only when WEBDAV is enabled, which I never saw...
jupyterlab-git Extension enabling git. Requires git distribution package.
jupyterlab-spellchecker Extension. This is NOT present in Python 3.6 overlay

System packages

Configuration

Omitted scenarios

I'm ignoring following scenarios and usecases, because they are either not used at all or they don't cover the base functionality we'd like to ensure in our "pre-flight check":

Originally posted by @tumido in https://github.com/open-services-group/byon/issues/1#issuecomment-1016306013

tumido commented 2 years ago

This list is not to be trusted, it's probably not complete. Can you help @lavlas ?

LaVLaS commented 2 years ago

Is py3.6 and jupyter classic still in discussion for BYON? I think python3.8 + JupyterLab should be the starting point for notebook packages. All of the jupyterlab* packages look good to me. I don't have any experience with jupyterlab-spellchecker but I assume it is a great feature when providing documentation in a notebook

I agree with the Omitted scenarios.

tumido commented 2 years ago

Is py3.6 and jupyter classic still in discussion for BYON? I think python3.8 + JupyterLab should be the starting point for notebook packages.

@harshad16 I would welcome your input on this, I have no idea about the desired support matrix here.

harshad16 commented 2 years ago

The minimum requirement for the image in RHODS are as follows:

Minimal notebook: Python 3.8 Juyterlab 3.0.x supervisor stable jupyterhub -

plugin: Jupyterlab-git stable

Python notebooks:

Boto3 x.y.z Kafka-python x.y.z Pandas x.y.z Matplotlib x.y.z Numpy x.y.z Scipy x.y.z

plugin: elyra-python-editor jupyterlab-git jupyterlab-s3

Tensorflow and pytorch requirement were determined based on GPU cuda availability at the time CUDA preference: v11.0.3 This has changed over time, as GPU is not available in RHODS

LaVLaS commented 2 years ago

The package versions of available in RHODS notebooks for the CUDA v11.4.2, PyTorch and Tensorflow v2.7.0 have been updated downstream. We will be updating those in ODH shortly

harshad16 commented 2 years ago

@LaVLaS we would have to update them in the thoth-station images, to get them correctly updated in ODH . should we update and open pr to odh ?

tumido commented 2 years ago

@harshad16 , why is supervisor part of the minimal requirements? is it used for anything? I haven't found it to be actually used in any way...

harshad16 commented 2 years ago

@tumido , supervisor is the component which is running multiple process required for the jupyterhub action.

l think i don't fully understand the usage of the issue here:

  1. are we trying to find minimum requirement for RHODS images i.e what is minimum requirements in term of packages that are required for jupyterhub suitable image to be approved as RHODS image? Then the answer my previous comment.

  2. If we trying to find out minimum requirement for a random image to be functionable in jupyterhub premises? Then i feel more than just packages, it requires certain scripts to be in place. this is the utilimate script that run after installation of packages step and set up of plugin step is carried out https://github.com/thoth-station/s2i-minimal-notebook/blob/master/builder/run

in my opinion any image that is to be run in odh jupyterhub should have that script and suitable packages to enable that script (like supervisor and jupyterlab)

Even for meteor, we have already pre determined few base image, which already are built on top of s2i-minimal, thus containing the following pre-requistes

please let me know if my question are side tracked and not hitting the right questions.

tumido commented 2 years ago
  1. are we trying to find minimum requirement for RHODS images i.e what is minimum requirements in term of packages that are required for jupyterhub suitable image to be approved as RHODS image?
  2. If we trying to find out minimum requirement for a random image to be functionable in jupyterhub premises?

I think no. 2 is correct. BYON is about RHODS instance admins (a customer) bringing their own container images so their users can use it. This issue is to determine the minimal set of really really required packages that needs to be present in any container image that user wants to bring in.

Using s2i-minimal-notebook is not a requirement. Using s2i at all is not a requirement. The image doesn't have to be RHEL/fedora/centos based.

Then i feel more than just packages, it requires certain scripts to be in place. this is the utilimate script that run after installation of packages step and set up of plugin step is carried out https://github.com/thoth-station/s2i-minimal-notebook/blob/master/builder/run

That's exactly what we're trying to determine - what needs to be in place? And that's exactly why I'm asking why supervisor is among those requirements. I don't see it being used at all. All I see is this branch used. And no supervisor ever instantiated by JupyterHub spawner. So why is it the minimal requirement?

https://github.com/thoth-station/s2i-minimal-notebook/blob/f8ec542abcc98208e45eb03da8b9a7c75cbe04f7/builder/run#L69-L71

image

In the end I did the exercise myself. It seems that JH Spawner requires:

  1. HOME env variable needs to be set to a writable directory
  2. Spawner is enforcing start-singleuser.sh script to be present in PATH. It is used as an entrypoint.
  3. start-singleuser.sh needs to execute jupyter labhub $@. Spawner is passing port and host arguments, so it either has to accept additional parameters via $@ or set port to 8080 internally (default is 8888, so it needs to be changed/enforced or consumed from argument)

That makes the minimal successfully spawneable image to be:

FROM python:3.8

RUN pip install jupyterhub jupyterlab

RUN echo '#!/bin/bash\njupyter labhub $@' > /usr/local/bin/start-singleuser.sh && chmod +x /usr/local/bin/start-singleuser.sh

ENV HOME=/tmp

https://quay.io/repository/tcoufal/jh-minimal-test

kind: ImageStream
apiVersion: image.openshift.io/v1
metadata:
  name: jh-minimal-test
  labels:
    opendatahub.io/notebook-image: "true"
spec:
  lookupPolicy:
    local: true
  tags:
    - from:
        kind: DockerImage
        name: quay.io/tcoufal/jh-minimal-test:latest
      name: latest

@LaVLaS can you confirm? What additional requirements we want to enforce?

harshad16 commented 2 years ago

@tumido , That seems like a very promising result. Till I know the supervisor was in there to help the user to enable it via a enable env var. https://github.com/thoth-station/s2i-minimal-notebook/blob/f8ec542abcc98208e45eb03da8b9a7c75cbe04f7/builder/run#L63 ack, on that it might be a minimum requirement.

Btw if you tested this out, please feel free to open a pr in the minimal repository and also if you would like to become a maintainer.

tumido commented 2 years ago

Btw if you tested this out, please feel free to open a pr in the minimal repository and also if you would like to become a maintainer.

Can you elaborate please? PR for what? Maintainer of what? s2i-minimal-notebook? Why? I don't understand.

We don't care for s2i-minimal-notebook here. We don't care about AICoE-CI or any thoth related things. The sole focus of this issue is to determine what to require from any image being imported through BYON, aka what content to put in https://github.com/thoth-station/helm-charts/blob/main/charts/meteor-pipelines/templates/byon-validate-jupyterhub-image.yaml task so it serves it purpose.

Till I know the supervisor was in there to help the user to enable it via a enable env var. https://github.com/thoth-station/s2i-minimal-notebook/blob/f8ec542abcc98208e45eb03da8b9a7c75cbe04f7/builder/run#L63

However this never happens without user actually enforcing it via custom env variable, correct? It's never used as a default option.

ack, on that it might be a minimum requirement.

Might? How? If it was there to "help" but not necessary in most cases, it's not a minimal requirement, correct?

harshad16 commented 2 years ago

Btw if you tested this out, please feel free to open a pr in the minimal repository and also if you would like to become a maintainer.

Can you elaborate please? PR for what? Maintainer of what? s2i-minimal-notebook? Why? I don't understand.

This response was that , if we have figured that there could be less stuff in the minimal-notebook, maybe it would be great to update it, that why i suggested the pr to s2i-minimal about being maintainer, as you are working on these bit, i was just offering to be a maintainer on s2i-minimal as well :smile:

We don't care for s2i-minimal-notebook here. We don't care about AICoE-CI or any thoth related things. The sole focus of this issue is to determine what to require from any image being imported through BYON, aka what content to put in https://github.com/thoth-station/helm-charts/blob/main/charts/meteor-pipelines/templates/byon-validate-jupyterhub-image.yaml task so it serves it purpose.

Till I know the supervisor was in there to help the user to enable it via a enable env var. https://github.com/thoth-station/s2i-minimal-notebook/blob/f8ec542abcc98208e45eb03da8b9a7c75cbe04f7/builder/run#L63

However this never happens without user actually enforcing it via custom env variable, correct? It's never used as a default option.

yes, i think so too, it is not a default action.

ack, on that it might be a minimum requirement.

Might? How? If it was there to "help" but not necessary in most cases, it's not a minimal requirement, correct?

yes that is correct. might was a typo, was going for might NOT be. :sweat:

tumido commented 2 years ago

Gotcha! :slightly_smiling_face:

I think we should clean up the s2i minimal then, but probably not now. Let's track and discuss that in a separate issue in s2-minimal: https://github.com/thoth-station/s2i-minimal-notebook/issues/514

LaVLaS commented 2 years ago

I agree that s2i-minimal is only a recommended requirement since a user can manually recreate the JH entrypoint script

I think at a minimum, we would validate:

  1. I would like to stay that ubi base images are a requirement if we want to standardized on a standard that would allow for an easy baseline for security scans
  2. Is actually a valid Jupyter notebook image that is compatible with JupyterHub (exposes the Jupyter notebook rest api and can be stopped/started) and runs JupyterLab
  3. Minimum python version, or lack of python, as a possible warning.

There may be additional validation steps but that may restrict what an admin can provide in a custom notebook image

tumido commented 2 years ago

I agree that s2i-minimal is only a recommended requirement since a user can manually recreate the JH entrypoint script

I think at a minimum, we would validate:

  1. I would like to stay that ubi base images are a requirement if we want to standardized on a standard that would allow for an easy baseline for security scans

@LaVLaS gotcha. How would you check for that? Is a check that /etc/redhat-release file exists enough?

  1. Is actually a valid Jupyter notebook image that is compatible with JupyterHub (exposes the Jupyter notebook rest api and can be stopped/started) and runs JupyterLab

Can you confirm the image above (https://github.com/open-services-group/byon/issues/26#issuecomment-1058413862) satisfies this for you? It starts/stops through spawner, accepts it's API key and starts JupyterLab.

  1. Minimum python version, or lack of python, as a possible warning.

What is the minimal requirement for Python version? 3+ or you want to be more specific and say like 3.6 and above? Is simple python presence enough for you?

tumido commented 2 years ago

Decision: Let's start with our minimal reproducer above as the upstream ODH requirements. Let's support Python 3.8+ only, using any image base, enforcing JupyterLab UI.