xarray-contrib / xarray-tutorial

Xarray Tutorials
https://tutorial.xarray.dev/
Apache License 2.0
179 stars 110 forks source link

Add more options to run interactively on the cloud #171

Closed scottyhq closed 4 months ago

scottyhq commented 1 year ago

With google dropping credits for mybinder.org recently i've noticed launching sessions are indeed more unreliable https://blog.jupyter.org/mybinder-org-reducing-capacity-c93ccfc6413f

It would be good to document running this content on other "free" platforms such as:

dcherian commented 1 year ago

Another option is a pyscript / thebe thing potentially? I don't know what the state of affairs is here.

lsetiawan commented 1 year ago

I just discovered this the other day that might be promising for this: https://jupyterlite.readthedocs.io/en/latest/ it uses pyodide and can run jupyterlab in the browser. It does uses the users local computing resources like a regular web app, so technically it's not fully "cloud".

dcherian commented 1 year ago

The tutorial content is mostly local datasets downloaded using pooch or synthetic datasets, so that would be totally fine.

scottyhq commented 1 year ago

Did a quick test with google colab (which I admittedly haven't used much). It's not really well setup for a directory of notebooks as far as I can tell, nor conda environments! The default runtime has the following versions pre-installed:

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.11 (main, Apr  5 2023, 14:15:10) [GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.107+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None

xarray: 2022.12.0
pandas: 1.5.3
numpy: 1.22.4
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.12.1
distributed: 2022.12.1
matplotlib: 3.7.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.4.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.2.2
mypy: None
IPython: 7.34.0
sphinx: 3.5.4

So many of the notebooks could be executed, but not all. A simple pip install zarr flox would work if you only need a few libraries. Installing a full-fledged conda environment is slow and cumbersome and per-notebook:

!pip install -q condacolab
import condacolab
condacolab.install()

import condacolab
condacolab.check()

# NOTE: this will take a while, be patient!
!mamba env update --quiet --name="base" --file="https://raw.githubusercontent.com/xarray-contrib/xarray-tutorial/main/conda/environment.yml"
scottyhq commented 1 year ago

Adding CI to publish a Docker Image to GHCR would be nice to facilitate running locally for people who like Docker and also running on GitHub Codespaces

scottyhq commented 1 year ago

AWS StudioLab is more straightforward because you have a full-fledged normal JupyterLab interface (file browser, multiple notebooks, a terminal). You still have to install the locked environment as a manual step, as the standard environment does not come with xarray. A bonus of using StudioLab compared to BinderHub is that content and environments persists across sessions.

Open In SageMaker Studio Lab

Note link syntax similar to binderhub above https://studiolab.sagemaker.aws/import/github/xarray-contrib/xarray-tutorial/blob/main/overview/fundamental-path/index.ipynb

mamba env create --name="xarray-tutorial" --file="https://raw.githubusercontent.com/xarray-contrib/xarray-tutorial/main/conda/environment.yml"
lsetiawan commented 1 year ago

Do you know if we need to pay for AWS StudioLab? It's asking me to login.

scottyhq commented 1 year ago

Do you know if we need to pay for AWS StudioLab? It's asking me to login.

You do have to create an account unlike Binder & Colab, but it is free without any credit card required. They impose daily usage limits (I think 12 hour sessions). We'll want to check resource limits and make sure the notebooks all actually run

lsetiawan commented 1 year ago

Gotcha sounds good. Posting link to their FAQ here: https://studiolab.sagemaker.aws/faq. There's a waitlist to make new account? At least that's what their FAQ said.

scottyhq commented 1 year ago

Oh didn't realize that!

Q: Why is there a waiting list to get an account? We are limiting the number of new account registrations at this time to ensure a high quality of experience for all users.

Q: How long do I have to wait for my account request to get approved? Account requests are typically approved within 1 to 5 business days.

That's definitely a deal-breaker for large tutorials where we likely won't be able to engage with participants beforehand to sign up. Will be good to know if you do get access in 1-2 days @lsetiawan !

lsetiawan commented 1 year ago

Update: I was able to be approved in 2 minutes and setting up the account took about 5 minutes. Though right now it's not straight forward on how to spin up the index notebook with the supplied conda environment... will have to investigate that more. I think this is a potential great way to run the tutorial. If we can get access to the people attending the tutorials, there can be some time to notify the participants to get AWS StudioLab account.

lsetiawan commented 1 year ago

Update 2: Looks like it's not very straight forward to open up the index.ipynb. Going to https://studiolab.sagemaker.aws/import/github/xarray-contrib/xarray-tutorial/blob/main/overview/fundamental-path/index.ipynb doesn't automatically clone the repo and spin up the environment. There are a lot of steps that need to be done, including cloning the entire repo, creating a custom environment from the conda env yaml file (like the instruction in https://github.com/aws/studio-lab-examples/blob/main/custom-environments/custom_environment.ipynb)... it doesn't have mamba so creating environment takes forever, and then navigating to the index.ipynb and opening that up. I feel like this is a lot of steps and I'm spoiled by my binder, but what do you think @scottyhq?

scottyhq commented 1 year ago

Thanks for looking into it @lsetiawan ! Agreed that studiolab is a bit tricky. In the end we'll have a couple options with some pros and cons that we can document on one of the website pages. I think near-term we should try out jupyterlite and codespaces too.

dcherian commented 1 year ago

Great stuff! Eventually, it would be good to summarize your learnings on the pro/cons of each option here: https://tutorial.xarray.dev/overview/get-started.html

lsetiawan commented 1 year ago

Linking the comment from @dcherian here: https://github.com/xarray-contrib/xarray-tutorial/issues/170#issuecomment-1599067588.

Currently Quansight is offering to host Nebari for tutorial and I think we should definitely take them up on that as Nebari is a really great system for this kinds of things IMO. I'll fill out the form for this. Looks like I need a few specs questions answer help.

  1. I think these 2 options are enough for the tutorial (these are the default machines they're offering) Small (2 CPUs, 8 GB RAM) Medium (4 CPUs, 16 GB RAM)

  2. I assume we don't need a GPU instance, it doesn't look like any of the tutorials uses that.

@scottyhq Could you confirm the above? Thanks!

dcherian commented 1 year ago

I'll fill out the form for this.

Thanks!

I think yes on (1), (2). We could optionally use GPUs but it isn't necessary.

scottyhq commented 1 year ago

we should manage with "small", but let's go ahead and request medium since some of the content will focus on dask and having a bit more than typically available on binder systems would be nice :) No GPUs necessary.

dcherian commented 1 year ago

I asked what the dask team was planning to do and got the following responses from Naty Clementi and Jacob Tomlinson:

  1. Naty: We were planning on running mostly local, but we talked about the chance of using coiled notebooks + jupyter-repo2docker to get everything on the image. https://blog.coiled.io/blog/coiled-notebooks.html
  2. Jacob: When I run RAPIDS tutorials I usually stand up my own Binder because I need to add GPUs to the nodes, it's pretty quick and easy to do, especially if you're just running vanilla Binder without the GPU stuff.
lsetiawan commented 1 year ago

@scottyhq and I discussed in person of going forward with Github Codespaces, and now there's PR https://github.com/xarray-contrib/xarray-tutorial/pull/184 for this setup specifically for Scipy 2023

lsetiawan commented 1 year ago

Quansight have hosted a nebari instance for the workshop, which can be found at https://scipy.quansight.dev/

scottyhq commented 4 months ago

After #184 we have the ability to run interactive sessions either on mybinder.org or github codespaces