radanalyticsio / base-notebook

An image for running Jupyter notebooks and Apache Spark in the cloud on OpenShift
https://radanalytics.io
4 stars 15 forks source link

Support use as an s2i builder #12

Open willb opened 7 years ago

sherl0cks commented 6 years ago

@willb is there anything short term plans to do this? If not, I'd be interested in making a pull request

willb commented 6 years ago

@sherl0cks that would be awesome! I'm pinging @sub-mod, who has done some work on refactoring our notebook images lately, but if it makes sense to him, I'd say go for it.

sub-mod commented 6 years ago

@sherl0cks https://github.com/radanalyticsio/base-notebook/tree/experimental branch has the new changes.But since base-notebook was used in other repos I had to create 2 new repos jupyter-notebook-py2.7 and jupyter-notebook-py3.5(this contains a flag to enable jupyterlab) with the changes done in experimental branch.These images let you install new python libs via conda install or pip install unlike base-notebook:latest. We don't have a short term plans on how to take this forward and on s2i.Although we will decide on that soon. But you are welcome to send pull request to jupyter-notebook-py2.7 & jupyter-notebook-py3.5 repos. Please take a look at below repos on some work done by GrahamDumpleton:

sub-mod commented 6 years ago

We have TODO activity to replace base-notebook with jupyter-notebook-py3.5 OR jupyter-notebook-py2.7 in https://radanalytics.io/tutorials. For all new work kindly use jupyter-notebook-py3.5 (for jupyterlab) or jupyter-notebook-py2.7

sherl0cks commented 6 years ago

:+1: thanks for the info @sub-mod

sub-mod commented 6 years ago

You are welcome.Hope to see your PR soon.

sherl0cks commented 6 years ago

@sub-mod why were these repos created instead of adding new directories in https://github.com/radanalyticsio/oshinko-s2i? I also don't see oshinko magic in the notebook containers. Is there a reason its not there? Because I think it would be a really awesome feature... So with that in mind, I would propose the following approach:

  1. Make a PR to https://github.com/radanalyticsio/oshinko-s2i that
    1. adds a new jupyter 2.7 s2i builder which extends the existing pyspark s2i builder. unclear to me yet if the s2i run oshinko magic can be inherited or needs copy/paste
    2. adds an assemble script install to jupyter py 2.7 to install arbitrary pip modules via requirements.txt a la GrahamDumpleton's work.
    3. adds functionality in script to install arbitrary notebooks with associate datafiles. it maybe useful to also support cloning notebooks/datafiles from existing git repos.
  2. if we like the approach, add python 3.x pyspark s2i builder based on the 2.x builder
  3. extend python 3.x s2i builder with a juptyer 3.x builder

Thoughts?

sub-mod commented 6 years ago

I like your idea of reusing s2i python image.I should i have done that instead of extending from openshift-spark.So I think the notebook images need a rewrite to remove FROM openshift-spark.

But There were some previous work done for (1) ,(2) and reduction in image size which didn't go into base-notebook repo so I created a separate repo just for notebook related work. What you are mentioning seems to be related to 2(b).

I will Combine all the Notebook repos into a single "Jupyter-Notebook" repo. Imo having a single repo for all Jupyter related work is better.What do you think ?

(1)The new jupyter notebook images allow users to git clone repos into the pods.They also allow installation of any python libs.You can do wget libs and install python libs which may not be available in conda or pypi.

(2)We had few usecases in mind. a)Notebook (with Spark )as a Service. Where we have Notebook/Jupyterlab templates on service catalog.We can provision a single instance of Notebook with a new spark cluster or Use an existing spark cluster. The user is free to git clone any repos after notebook pod has started. This would need oshinko client & scripts but NOT the s2i builds b)S2I build driven Notebook (with Spark )as a Service. This would need oshinko client & scripts and the s2i builds c)JupyterHub with spark: d)Tensorflow dev and model serving with notebooks.

sherl0cks commented 6 years ago

your use cases make sense to me, and converging all this work in one place is a good idea. my gut reaction is that you are building this stuff with s2i so it should go in the existing s2i repo, especially since you are going to extend the python base layer in that repo. but a repo for all Jupyter resources works too, and I'm not a maintainer around here, so I'll happily respect your ultimate decision on the location.

sherl0cks commented 6 years ago

any interest in supporting other kernels beyond Python?

mattf commented 6 years ago

yes. Java, R, and scala are important.

On Mon, Dec 18, 2017 at 12:04 AM, Justin Holmes notifications@github.com wrote:

any interest in supporting other kernels beyond Python?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/radanalyticsio/base-notebook/issues/12#issuecomment-352326929, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG4DaxzGSd0XIAmrT8CKFTEqkbSyPutks5tBfJtgaJpZM4NnkmB .

sherl0cks commented 6 years ago

great. once the dust settles on getting the notebooks to a resting place @sub-mod likes, then I'll open issues in that repo. I'd be interested in contributing there as well.

sub-mod commented 6 years ago

@sherl0cks Let us go with a single jupyter repo for now.We can always push it to s2i repo later on. I am waiting for the all the code to show up in https://github.com/jupyter-on-openshift . Graham has created a s2i pipeline which lets you have multiple kernels in a single final jupyter application image.

sherl0cks commented 6 years ago

works for me. let me know when the dust settles.

willb commented 6 years ago

cc @mmgaggle