ploomber / soopervisor

☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.
https://soopervisor.readthedocs.io
Apache License 2.0
45 stars 18 forks source link

migrating docs #109

Closed edublancas closed 2 years ago

edublancas commented 2 years ago

We haven't updated soopervisor's documentation in a long time. The main caveat is the examples: we're not testing then (which causes breaks to go unnoticed), and they don't show the output of each command.

Another complication one changed we made to Kubernetes-based examples: since we wanted to "simplify" configuration, we added a Dockerfile that comes with everything pre-installed; hence, starting the examples involves doing something like this:

docker run -i -t \
    --privileged=true -v /var/run/docker.sock:/var/run/docker.sock \
    --volume $SHARED_DIR:/mnt/shared-folder \
    --env SHARED_DIR \
    --env PLOOMBER_STATS_ENABLED=false \
    -p 2746:2746 \
    ploomber-k8s /bin/bash

Source.

However, this proved even more problematic since it's confusing, and we had a few users getting errors when executing the command.

Changes

migrate to jupyter-book

jupyter-book offers excellent support for Jupyter notebooks, a better way to showcase tutorials. This library supports .rst files, so there won't be many changes.

But we must ensure that admonitions:

image

source

And tabs:

image

source

Are properly updated. I think jupyter-books has support for both, but some changes might be required.

pre-processing notebooks / using a bash kernel

Our tutorials will show Soopervisor's command-line interface, so if we write them in a Jupyter notebook, we need a preprocessor to remove %%bash magics and ! command from all cells.

I found that there is a bash kernel, which is pretty easy to install: it has a few benefits, such as live-updating output (when using %%bash, the output of the command is displayed until it finished), ability to run cd some-dir and have the next cell keep the state (as opposed to %%bash, which executes each cell in a new subprocess). I think it overall is the best option, and I don't see any issues. This is the best choice if it renders well on jupyter-book.

the only problem I encountered with the bash kernel is that it's doesn't correctly get the right Python environment (upon creating a bash notebook, doing which python did not print the python interpreter inside the environment that was active when calling jupyter lab). so we have to figure this out.

switch to kind

our kubernetes example use k3d. I've been recently using kind, and it's a lot simpler and faster. so we should switch.

some of our new features are difficult to explain and showcase, a notebook-like example will make it better

although before switching to kind (or any other alternative) we should see if it's possible to run them on binder (see last section for more information)

adding more tutorials

We added two major features to soopervisor: the ability to include a lib/ directory and support for multiple docker images for the same project. And while these are already documented, they don't have tutorials. We should add them.

binder

This is a long-shot but it'd be great to have an easy way for users to try soopervisor. the main challenge is that a meaningful example would mean having infrastructure such as kubernetes, airflow, or SLURM.

this is technically possible but we should test. making changes to binder is always a pain, but I think this can make a big difference in adoption. we can start with argo and see how it works.

for kubernetes-based examples such as argo, my only concern is that the choice of the tool to create the local cluster (k3d, kind, minikube) may have an impact and maybe some of them won't work on binder since that already is a kubernetes pod. My guess is that kind will work fine but I'm no kubernetes expert. so maybe it's better to try first

edublancas commented 2 years ago

I started working on this (tutorials branch), here's an example of the cargo tutorial re-written using the bash kernel: https://github.com/ploomber/soopervisor/blob/tutorials/kind/doc/argo.ipynb

Something that I noticed is that my Jupyter froze when running docker build (which executes when running soopervisor export) - looks like the bash kernel does not handle the verbose docker build output? One fix is to redirect the output to a file and then print it:

soopervisor export > output.txt

then:

cat output.txt