nuest / ten-simple-rules-dockerfiles

Ten Simple Rules for Writing Dockerfiles for Reproducible Data Science
https://doi.org/10.1371/journal.pcbi.1008316
Creative Commons Attribution 4.0 International
61 stars 15 forks source link

Rules 6 & 7 #76

Closed bdevans closed 4 years ago

bdevans commented 4 years ago

I just wanted to raise a seeming inconsistency in these two rules where Rule 6 states that software versions should be written in the Dockerfile (rather than a separate e.g. requirements.txt) but Rule 7 states scripts should be mounted in the container (which unlike the software versions, makes them more accessible when the workspace is published). Mounting large data sets makes sense but scripts are very lightweight and it's a somewhat fuzzy distinction between scripts and software dependencies if they are key to the workflow for processing/analysing/plotting the data which we wish to reproduce.

vsoch commented 4 years ago

The (unsaid) distinction I think is for a container intended to be used solely for an interactive environment, for which you'd mount a script at runtime. This is one of the rules I don't agree with 0 I personally think for both cases scripts should be included in containers, and actually the user should err on the side of caution and include all the scripts they need, and only leave out data that is so huge it would be infeasible. Why? Because mounting is not reproducible, and for the most part we are writing about the use case of containers that go along with published work. For where the versions get written, in the Dockerfile or a requirements.txt (or similar) file is fine as long as it's included in the container.

bdevans commented 4 years ago

Yes, I agree. I tend to write an external requirements.txt which is read as part of the docker build process but also available for other types of use. Ultimately, as long as scripts and requirements are under version control in the same repository that goes a long way to making it reproducible but I think it would be better to either include scripts at build time as a (custom) software dependency.

vsoch commented 4 years ago

This was already brought up as a concern too: https://twitter.com/iancal/status/1251294579606896643?s=21

bdevans commented 4 years ago

I hadn't seen that but yes, I agree. If you've ever used sumatra a project management tool, it refuses to let you run uncommitted code as a solution to the problem mentioned in the tweet. In our case, including the script at (the end of) the build process is a simpler and more natural solution (as implied by the tweet).

nuest commented 4 years ago

Just had a chat with @bdevans about this, here's my take:

nuest commented 4 years ago

@bdevans #79 only partially covers this issue, right?

bdevans commented 4 years ago

@nuest what did I miss?

nuest commented 4 years ago

@bdevans Never mind, I was confused because of the "old rule 9", but the rule reordering was not part of the PR yet. I assume you'll start another PR for that (good call on going smaller PRs btw).

bdevans commented 4 years ago

Yes that's right. I'll start a new PR as I think it would have been too much to digest in one go otherwise and merging with parallel PRs would have been a nightmare!