rocker-org / rocker

R configurations for Docker
https://rocker-project.org
GNU General Public License v2.0
1.45k stars 273 forks source link

build a container farm of rocker instances with caddy reverse proxying #470

Closed quinfer closed 2 years ago

quinfer commented 2 years ago

Hi, apologies for now really raising an issue here but my query is simple if anyone in the rocker community has build a container farm. The use case is teaching data science to a class of about 100 students using one large cloud VM.

Thanks

cboettig commented 2 years ago

@barryquinn1 good question, should be do-able.

Previously I have done this by deploying only a single rstudio container with a script that adds multiple users. This is simple and RStudio handles multi-user setups rather well. (I still add a caddy proxy so I have https connections)

Of course a multi-container deploy is possible, but what's your desired net architecture look like in this case? Do you give each container a different sub-domain? There's probably something more clever than that (registering 100 https subdomains for this seems cumbersome) but it's not obvious to me how caddy would handle the logins to redirect each student to a separate container. I too would be curious if anyone wants to share configuration?

Isolating students to separate instances of rstudio in separate containers makes sense if you want to guarantee a shared resource limit to each (i.e. give each student 5 GB of RAM or so), but that may scale poorly; often I find I get better performance with shared resources. Also makes it easier to do stuff like shared package library or data without any volume mapping.

Last note, not caddy-based, but our team here at Berkeley deploys Jupyter and RStudio to a Kubernetes-based cloud system which also does this -- each student gets their own container with their own RStudio (or JupyerHub) instance. Docs here: https://zero-to-jupyterhub.readthedocs.io/en/latest/ . I haven't tried the self-deploy setup, but this system has been working great with many of the large enrollment (1000+) classes we have at Berkeley.

eddelbuettel commented 2 years ago

At for it is worth at Illinois we use RStudio Cloud which does the auto-scaling on their side -- not really something us instructors should have to fight or sys-admin.

I suggest we close this. It is a wee bit outside the scope of Docker, so a well-written document detailing best practice, if it existed :wink: , could make a nice addition to the wiki.

quinfer commented 2 years ago

@eddelbuettel and @cboettig thanks a million for the feedback and apologies for the off-piste query. Actually with some digging I found this paper https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1397549 from Duke University where they deploy a docker farm. Mark McCahill has kindly shared his set up here. https://github.com/mccahill/docker-rstudio. This is not caddy based, rather it using docker-gen and docker-nginx with some clever scripting to build a small farm (approximately 50 students per VM). @eddelbuettel I agree that as an instructor this is outside my battle ground. I have looked at rstudio workbench and kubernetes, but my sys-admin colleagues have warned me the kubernetes has a high learning curve (my understanding is RStudio Cloud runs on this set-up). My goal is to develop enough knowledge myself to provide meaningful pedagogy for a dockerisation in finance course in the future. Anyway, I will close this now and thanks for your time.

eddelbuettel commented 2 years ago

Sounds good. My course runs testing, quizzes, exams, ... in a system call PrairieLearn built here at Illiniois but now used too at UBC, Berkeley, ... and other places. This uses Docker extensively and we use a bits and pieces of Rocker for the setup (both a base layer on top of r-base as well as rstudio to serve it). But the deployment guts are shielded from me, and it "just works". We could probably peel some of that autoscaling out but I agree that borrowing some tricks from the Duke U setup is a very good idea. Keep us posted, and if you can, document what you do.