nuest / ten-simple-rules-dockerfiles

Ten Simple Rules for Writing Dockerfiles for Reproducible Data Science
https://doi.org/10.1371/journal.pcbi.1008316
Creative Commons Attribution 4.0 International
62 stars 15 forks source link

"simply pushing the `Dockerfile` to the service" #44

Closed psychemedia closed 4 years ago

psychemedia commented 4 years ago

https://github.com/nuest/ten-simple-rules-dockerfiles/blob/4a87e3e3ad43feacd98722f1521e500191bb17bb/ten-simple-rules-dockerfiles.Rmd#L402

It's also possible to set up CI tools so that things are built according to a schedule (cron job) or in response to particular actions (eg making a Github release) not just on every commit, which may get ridiculous (eg your half hour build is triggered each time you fix a typo in the README). CI builds can also be set to ignore rebuilds when commit messages contain particular tags, for example.

Writing a good CI script, that caches things sensibly, is a skill in and of itself and that could perhaps also benefit from a guide such as this one...

vsoch commented 4 years ago

Yes that's correct - typically testing of the build is done with every PR, and then deployments are done with merge to master, or some official release or tag.

psychemedia commented 4 years ago

"Typically" may just be the result of a default somewhere, though.

Automated builds can be used for different things, though. eg CI building for test purposes whenever there's a commit, or building as a release (eg autobuilding and pushing a container associated with a particular release to Docker Hub).

If you are someone who is editing files in a Github repo, then if you want to make N changes to N files, I think each changed file will cause its own commit that may by default trigger a build, whereas if you made the changes in a local git repo, all the commits would be added to the Github repo as part of a single push (and hence, a single build).

We should be ecologically minded, I think, about the side effects we generate whenever we push something into an environment that is instrumented with automated build tools and processes.

vsoch commented 4 years ago

If you are someone who is editing files in a Github repo, then if you want to make N changes to N files, I think each changed file will cause its own commit that may by default trigger a build, whereas if you made the changes in a local git repo, all the commits would be added to the Github repo as part of a single push (and hence, a single build).

You should not be casually editing files in a GitHub repo, any community project has established steps for requesting changes (opening an issue for example) and then a pull request to discuss review. This article isn't about best practices for version control (GitHub) but what you state is (what I would consider) bad practice.

psychemedia commented 4 years ago

Agreed. And one way of educating folk who are outside a tradition in to good practice is to give examples of bad practice they may demonstrate and nudge them into better ways.

But it's also worth noting that different communities may have different practices; and that things like containerised environments may be really useful to folk who are practitioners rather than developers, and who are more interested in generating "content scripts" (narrative analyses in notebooks) than they are in software engineering but still see the benefit in being able to script and share their working environment with peers or publishers.

vsoch commented 4 years ago

Totally agree! I think it's important then to explicitly state the audience, even in the title. For example, I'm a software engineer / open source developer, so my use cases / experiences with containers can be very different than those of a scientist. That said, it wouldn't be so terrible if scientists asked research software engineers to build their containers (and the RSEs would then use best practices from software engineering with respect to setting up the repo, CI, thinking about the build design, etc.)

psychemedia commented 4 years ago

Yes... The RSE community and its concerns is good focal point, I think (but still a small one?). Setting up the guidelines as "this is what professional RSEs do, and you can too" might be one way of pitching this to an audience that is at least aspirational towards a practice that supports reproducibility?

vsoch commented 4 years ago

Definitely! And even saying "If this is too much, then find support / help from an RSE."

nuest commented 4 years ago

IMO the peculiarities of how many CI runs are happening and triggered by what are beyond the scope of this article. I did add a reference to RSEng.

@psychemedia I have an article about CI for research in the pipeline for a while but never got to it. Let me know if you're interested in chatting about that.

psychemedia commented 4 years ago

@nuest I'm not really at anything other than the hopeful optimist amateurism level (as with most other things) when it comes to CI. I spent a chunk of lasting week trying to get my head round using some simple Github Actions but it's mainly for Github Pages publishing workflows. Certainly no serious usecases.