Closed psychemedia closed 4 years ago
I personally don't think that "efficiency" in the images matters for reproducible computing environments. Rule 10 gives some advice how not to constantly re-build the image, but other than that I think a data scientist will spend way more time thinking and running a workflow than she will spend waiting for an image to build.
Re. "anatomy of a Dockerfile": good point, tried to do the minimum only in https://github.com/nuest/ten-simple-rules-dockerfiles/commit/e482b86a1fa97d1c038df9992ffb75d54a9aba21 and https://github.com/nuest/ten-simple-rules-dockerfiles/commit/d08dc3e9cf820a5f47566d4d33a400c75df2d95a
IMO splitting a RUN
command does not "break" layering, it might even improve it. Still, tried to clarify.
Since you digressed from the "one discussion issue per rule", I'll close this one, feel free to re-open if important comments are not yet addressed.
Re: splitting things across RUN commands - it adds the the number of layers doesn't it? Which means it can also affect the amount fo time/bandwidth required to download an image.
If you have a build step with an rm
tidy step, if you finish the RUN
with the && rm ...
command, the thing that's removed is not in the layer, whereas if you do RUN ..build bits...
then RUN rm ...
, you do download the stuff in the first layer and you then delete it from the second?
Of course, my naive understanding of how the implemented mechanics of docker containers actually work could be completely wrong!
Would it make sense to give explicit examples of good practice and bad practice, perhaps in a contextualised way, eg show a colour highlighted git diff going from bad practice to good practice with a comment or a git commit line explaining the change in terms of the rule applied? Or maybe link to a supporting git repo where a scrappy Dockerfile has been revised into a best practice example?
When mentioning:
a naive reader may misinterpret this instruction and put lots of things on separate lines each with its own
RUN
command, which would break layering?So the instruction:
is problematic when it comes to writing Dockerfiles that build "efficient" images?
Would it make sense to have a section at the start of the paper that describes the anatomy of a Dockerfile, and perhaps also situates it in a workflow (Dockerfile -> image -> container).
[typo -
directoryies
] So in terms of best practice, is there something here about identifying not just which directory you are in and how to change it, but how to select appropriateUSER
's for running certain commands?