nuest / ten-simple-rules-dockerfiles

Ten Simple Rules for Writing Dockerfiles for Reproducible Data Science
https://doi.org/10.1371/journal.pcbi.1008316
Creative Commons Attribution 4.0 International
64 stars 15 forks source link

Discussion: Rule 2. Use versioned and automatically built base images #9

Closed psychemedia closed 4 years ago

psychemedia commented 5 years ago

One of the things I try to remember do is fork the original Dockerfiles or add reference link in my Dockerfile to the location of the Dockerfile for the the image I am pulling from.

This means I syand a chance of recreating something akin to any base container I pull on from my own copy of the Dockerfile. (Of course, if the Dockerfile uses "latest" package versions, then a build today may result in a container that is different from a build tomorrow.)

When pulling from a tagged/versioned image in a Docker repository, that image may have been built from a Dockerfile that:

So whilst my image may be reproducibly built from my Dockerfile as long as the image I'm building from exists, it may no longer be reproducible if that image disappears, or changes (eg changes because I didn't pin the exact image version ).

Depending on levels of trust, or when working in a closed / private environment, you may want to build your own, versioned image from someone else's Dockerfile (or one you have forked) and then pull on that.

I don't know if there is any emerging practice around archiving software projects that are based around Dockerised environments? I would imagine a complete archiving service might run it's own Dockerhub and push locally built or cloned images to it if that service were making runnable archived environments available?

vsoch commented 5 years ago
  • base images that have complex software installed (e.g. ML libraries, specific BLAS library) are helpful and fine to use, just ensure there is a publicly available Dockerfile that they use (and add a link to that file in your Dockerfile)

This I've found to be a bit risky - I've done projects that have parsed thousands of Dockerfiles from Dockerhub between years, and it's astounding how many of them completely go away. I would say that if a user is really interested in using someone else's ML container, they are best to grab the Dockerfile (the entire file) and put it in their repository and then deploy with an automated build. That way, if the Dockerfile / container go away, it's the choice of the creator (and trust isn't placed in someone else).

nuest commented 4 years ago

I think both of you raise important issues of which many users are not aware. I've tried to address them a little bit in the commit linked above, but I think archival of environments raised by @psychemedia is out of scope for this work.

nuest commented 4 years ago

@psychemedia Can you double check the current draft if it addresses your comments, please?

psychemedia commented 4 years ago

@nuest Yep, fine, though a couple of typos and bits of tidying up required which I meant to tag back here from https://github.com/nuest/ten-simple-rules-dockerfiles/pull/47