Managing OCL Docker image / container sizes

psychemedia commented 4 years ago

This is a discussion issue relating to creating efficiently sized Docker container / images for use delivering Open Computing Lab environments.

When Github makes discussion forms available it perhaps would make sense to start a thread there.

psychemedia commented 4 years ago

The official Jupyter Docker stack base-notebook (docs) is the smallest official image providing a running notebook server.

The minimal-notebook (docs) adds in Latex for generating PDFs. PDFs can also be generated via a non-Latex route using Chromium (betatim/notebook-as-pdf).

I put out a query regarding the smallest possible Binder container and had minrk/smallest-binder back as a suggestion. That repo includes useful discussion in the README and some example branches.

If building up from an official Python container, python:3.7-slim seems to be the most efficient route in general; python:3.7-alpine may not support all the packages a distribution might need?

In terms of pulling Python from a package manager, see this discussion on using miniforge to get python.

mmh352 commented 4 years ago

I've been building a container for tt284 and initially I built it off the base images repo2docker uses (about 1.5GB). Then I used a Dockerfile directly to build off python:3.8-slim and that saved about .4GB of space (1.1GB). JupyterLab and the Node stuff just take up a lot of space, little that can be done about it, it seems. That is almost 1/3 space saving, but at the cost of having to build everything from scratch.

While there are no discussion forms, perhaps you could add some labels, that would distinguish bugs and discussion and so on

psychemedia commented 4 years ago

repo2docker can build from a Dockerfile (eg Use a Dockerfile for your Binder repository) so we could certainly create a minimal viable Docker container that runs either standalone, or via MyBinder/BunderHub, or via JupyterHub etc.

Current examples of include innovationOUtside/OUbrandednotebook uses a Dockerfile that builds on a Jupyter base container but could be built on something more minimal, and ouseful-course-containers/ou-example simply adds OU customisation to a default repo2docker/MyBinder build.

One thing I am keen not to lose is the ability for other people to create containers as easily and straightforwardly as possible.

Running something on MyBinder from a vanilla repo means the user (another academic interested in exploring things, for example) doesn't really have to do anything other than put some content files into the repo. Adding requirements.txt is not overly onerous although in several years of MyBinder availability I don't think I've managed to get anyone not me to try to actually use it in the OU because folk see even getting a Github account and clicking on "Create New Repository" as too hard... And for many, understanding requirements.txt is apparently also too hard.

Other advantages of using official Jupyter layers is that they are maintained, and they work and are regularly tested against other bits of the Jupyter ecosystem. There does not appear to be a culture of even occasional, let alone regular or continuous integration and test in systems LDS "support" or "test" for module use, so any way of leveraging Jupyter standard issue containers is to our advantage.

Not locking things into particular OU base layers also means other educators might be willing to build and contribute environments.

I agree that a build without JupyterLab might be useful, but we could perhaps use postBuild instructions to remove applications from Jupyter base containers that we want to strip out? The build time is not necessarily a consideration if we are shipping prebuilt images to students.

mmh352 commented 4 years ago

I'm not so sure whether the size difference is that important, so I wouldn't worry too much about it.

I agree that the bigger issue is supporting other academics in using this technology. That's a much trickier aspect. I'm not particularly convinced that we can build something interesting that allows other academics to simply drop their content into a copy of a template. I think we can build something, but it is then effectively the digital equivalent of a chalk'n'talk lecture. If you want to do something interesting with the technology, then you have to get dirty with the technology, adapt it to the necessities of the content. For that I don't think much apart from some help with the technology can be provided. Perhaps when we have a broader set of containers people are using, then we can see what can be done/provided to help others join in?

psychemedia commented 4 years ago

@mmh352 "I'm not so sure whether the size difference is that important, so I wouldn't worry too much about it." <- you seem to be worrying about it by wanting to build your own images from a de novo Dockerfile! ;-)

Re: making things useful to others, I think I disagree. We can have a range of base containers that folk could use as a provided off-the-shelf environment, and could perhaps even automate the construction of binder templates from form driven UI (eg select which application you want bundling into the container).

For examples around the edges:

Binder builder GUI: early proof of concept exploring what a Binder config building UI might look like;
Binder base boxes with nbgitpuller (the idea here is you build an app/requirements image in one repo, and then use that as a base box environment into which you pull content from your simple content repo; here, academics can just write the content in their repo, and then trivially pull it in your HTML app writing environment built from another repo). See Binder Base Boxes, Several Ways… for a riff on this.
template repos giving examples of how to put different applications etc into Binderised environments.

mmh352 commented 4 years ago

:-) Well, the size conclusion came after I re-built it using a de novo Dockerfile :-D. Some things you just have to learn the hard way.

Thank you for the links. I'm basically using an approach very similar to the Binder base boxes + nbgitpuller approach.

psychemedia commented 4 years ago

Adding a link to demo TT284 example dockerfile config: https://github.com/mmh352/tt284-container

innovationOUtside / Open_Computing_Lab_Guide

Managing OCL Docker image / container sizes #3