Open yuvipanda opened 4 years ago
The USGS has a huge R community and we have Rstudio on our Pangeo JupyterHub. So big thumbs up here!
Are there strong reasons to have R in the same image as python? Do users often switch between them in the same session? @rsignell-usgs's solution seems simpler.
Until a few days ago, I'd have said there aren't. But then I saw a PhD student as she "quickly drop[ed] to R, because the API for some online repository of observational data is more mature there", pulled a dataframe back to Python, and powered on with Pandas a minute later all without ever leaving the same notebook.
We ran a hub this summer that had both Python and Rstudio (installed via conda-forge) https://github.com/RPVote/jhub-rstudio. The image size was gigantic, and while people did jump back and forth between interfaces (jupyter notebook or rstudio), they did mostly seem to stick to one language.
I guess another question - are R users connecting to a k8s cluster for distributed computing? This isn't really necessary for a pangeo image (in fact most pangeo hub users are likely using dask-gateway a small fraction of the time!), but a key design principle for this repository was to force pangeo hub images to use the same dask versions to stay in sync. So images in this repo get updated together rather than separately.
Another option, if 2i2c or USGS images were pushed to DockerHub or Quay that would make it easy for any hub to point to them, right?
We also had an earlier version that had a Julia environment bundled into the image.
This brings us back to a discussion we had earlier. It would be amazing if we could provide environments via shareable kernels rather than docker images. If there were some way to publish kernels, that would be much simpler. The docker image could then just contain the base jupyterhub stuff and leave all the computational details to the kernels.
Until a few days ago, I'd have said there aren't. But then I saw a PhD student as she "quickly drop[ed] to R, because the API for some online repository of observational data is more mature there", pulled a dataframe back to Python, and powered on with Pandas a minute later all without ever leaving the same notebook.
I think this is a big use case. Another is sharing code between users who use primarily python and users who primarily use R. This often leads to mixed code, so having them in the same image is very useful.
However, that doesn't mean it needs to be in this repository. It's useful for other repos to build on top of this repo, with all the automation this has. So that would be an option, especially since we can move it into this repo if there is a lot of use.
Another option is to start from the R specific rocker image, and add PANGEO specific things (dask-gateway, etc). However, I think that'll be a lot more work to keep up to date.
I'm new in the community, I'm implementing a cluster for researchers in EU facility and I'd be very interested to have RStudio spawned by jupyterhub.
Is it planned or in progress?
I have an environment that builds off our base image, but doesn't include RStudio. It only offers jupyterlab as a UI. Would people be interested in that as a stop-gap until optional RStudio support can be integrated into our base image?
edit: The branch is at https://github.com/pangeo-data/pangeo-docker-images/compare/feature/r?expand=1. I haven't done any work to integrate it into this repository's build automation.
In https://github.com/2i2c-org/pangeo-hubs/pull/15, we added R to a base pangeo image to so folks from Farallon can use it. Most of that is upstreamable - would there be interest in adding an R image here? It would provide the R kernel, RStudio, and probably an
install.R
onbuild system similar to what repo2docker offers.