Closed dhruvbalwada closed 1 year ago
Correct, we are not using them much right now. However, there are several project spinning up now that will require ML Pangeo images, so it's a good time to think about this.
IMO, before creating more images, we need to make a plan to address how to maintain these images sustainably going forward. Within a month or so we should have a dedicated, full-time Pangeo engineer at 2i2c, and that person should be able to help out with this.
I don’t use these images.
My $0.02: the many images problem is a symptom of a docker not being a package manager. Dockerfiles are a linear sequence of commands while packages form a dependency graph. It will always be hard to map docker images onto the packages people want.
Maintaining multiple images is painful. Honestly, for scientific workflows with GB/TB scale datasets, “light” containers don’t seem worth the trouble. If you can get away with it, I suggest 1 mega docker image (you need to pin of all package versions or it will constantly break) or leveraging a tool like repo2docker if you need multiple images. You can also e.g. have packages installed when a user starts a container like the dask image does.
It looks like this repo already uses repo2docker, so maybe the tooling is good enough to support many images 🤷 . Maybe pin the “from
@nbren12 thanks for the comments. This repo is a bit confusing to understand, despite the tags images are in theory reproducible thanks to using conda-lock to presolve for the environment added to the docker image, so for example to recreate an image from the past:
git clone https://github.com/pangeo-data/pangeo-docker-images.git
cd pangeo-docker-images
git checkout 2020.09.30
docker build -t pangeo/base-image:master base-image
docker build -t pangeo/ml-notebook:2020.09.30 ml-notebook
GPU-enabled ML packages are hard to cram into the same conda environment though in our attempts so far, which is why perhaps it's best to pick either tensorflow or pytorch. Preferably we have someone actively using the image responsible for curating the packages. Not sure who that would be these days?
It will always be hard to map docker images onto the packages people want.
Couldn't agree more. Although we've gotten a lot of mileage out of people using a common environment on pangeo hubs. For long term sustainability though, someone will need to tackle allowing users to customize their environment: https://github.com/pangeo-data/pangeo-docker-images/issues/148
Ah yes. I see the lock files now.
GPU-enabled ML packages are hard to cram into the same conda environment though in our attempts so far
Interesting. What's the main barrier? Package versions not resolving?
Interesting. What's the main barrier? Package versions not resolving?
Yeah. For example trying adding pytorch-gpu and jax in #179 https://github.com/pangeo-data/pangeo-docker-images/runs/1712185623?check_suite_focus=true
It seems like the general guidance is not to mix conda channels (ideally everything comes from conda-forge with the 'strict' channel priority setting). But to get the GPU-enabled packages we've had to relax that setting (https://github.com/pangeo-data/pangeo-docker-images/blob/master/ml-notebook/condarc.yml) and point to packages on specific channels: https://github.com/pangeo-data/pangeo-docker-images/blob/b6e6b19cf6890ce56010e3ef7a7584f49bda3198/ml-notebook/environment.yml#L11-L14
Good to know. This topic provokes so much in me---I've spent a lot of time maintaining developer environments. I've been interested in a package manager called nix which is basically a more composable docker. I hope it picks up steam in the next few years.
For some context, I will share the amazing blog post Noah recently published on this topic! https://www.noahbrenowitz.com/post/2021-version-pinning/
It's a hard problem, but one we should keep plugging away at. We don't have a perfect solution yet, but we have made good progress!
love the post @nbren12 this one is also worth checking out for tips on reducing image size https://uwekorn.com/2021/03/01/deploying-conda-environments-in-docker-how-to-do-it-right.html
Closing this as we've added a pytorch-notebook
image in #315. See also discussion at #457 on optimizing the ml-notebook
(tensorflow) and pytorch-notebook
images further for GPU-accelerated workflows.
Was talking to @scottyhq about using the ML image over here and having pytorch preloaded. I know @rabernat has asked about this before (#179) .
We were wondering who all are using the ML image? and what might be the requirements they have? @nbren12 @jhamman It seems like the usage for the ML image is low based on the pulls here: https://github.com/pangeo-data/pangeo-docker-images.
Since pytorch and tensorflow are two of the big candidates,(and maybe used independently usually), @scottyhq suggested having a pangeo-pytorch and a pangeo-tensorflow.
Any other thoughts that people have?