nuest / ten-simple-rules-dockerfiles

Ten Simple Rules for Writing Dockerfiles for Reproducible Data Science
https://doi.org/10.1371/journal.pcbi.1008316
Creative Commons Attribution 4.0 International
61 stars 15 forks source link

Restructure rules and various small improvements/trimming based on discussion around visual summary #65

Closed bdevans closed 4 years ago

bdevans commented 4 years ago

This PR addresses Issue #62.

This Rmd file can serve as a sandbox for easy reordering and rewording under version control before making large changes to the main document. Once finalised, this can then be turned into a figure.

vsoch commented 4 years ago

This is looking great! I think the one point for discussion that would be good to hear from @nuest @sje30 @psychemedia about is with respect to the one Dockerfile per project rule - it doesn't fit nicely in either section, and it's not clear to me that one Dockerfile is always the best approach. It would make sense to point out if we looked around and saw a lot of researchers unnecessarily creating more than one, but I can't find examples of that (can others?)

nuest commented 4 years ago

@vsoch FYI: I had a short call with @bdevans this afternoon and he explained some of the thinking behind the changes. We also discussed #76, where the two of you raise some relevant concerns. I'm on board with both changes (this PR and #76), as you seem to be. I asked Ben to go ahead preparing a larger PR, including the adjustments needed in the main text to reflect the visual summary developed here. We should have that beginning of next week :crossed_fingers:.

@sje30 Could you hold of with your review another few days until Ben's PR is merged?

sje30 commented 4 years ago

@sje30 Could you hold of with your review another few days until Ben's PR is merged?

Sure, ping me when I should read! S

nuest commented 4 years ago

@bdevans Do you know when you'll be able to work on the next partial PR? I do not want to put pressure on you, but simply want to make sure I schedule some time to give feedback.

bdevans commented 4 years ago

Hi @nuest sorry for the delay - I've got busy with a conference. I've just submitted a PR for the #77. Once that is merged in, I'll aim to reorder the rules for this PR one evening over the next few days (should be done by Friday if not soon before).

nuest commented 4 years ago

Good progress :partying_face:

bdevans commented 4 years ago

Thanks @nuest and sorry for the delay - it's more time-consuming than I expected!

I think that's all the changes made to the text now. I had thought about adding more subheadings to (loosely) correspond to the sub-rules but they wouldn't be necessary throughout the whole paper, so it might be better to leave it as it.

The other change to consider would be to start cutting out more off-topic advice in order to refocus the paper (on reproducibility). I've removed a couple of points in that spirit but further cuts might be controversial. I think it's best if everyone has a final read through now, keeping in mind that shaving off a few less relevant sentences could be helpful. Once everyone is ok with the text, I can start producing the summary figure.

vsoch commented 4 years ago

Was there a movement to change the focus to reproducibility (I think I missed it!)

bdevans commented 4 years ago

Hi @vsoch, maybe I misunderstood but I thought that was always the focus given the title. More recently there has been some discussion that the manuscript might be a bit long and the rules are not so "simple" anymore. Therefore, as I went through I tried to keep the focus on "Writing Dockerfiles for Reproducible Data Science" and considered removing text that was more "general tips and tricks for Docker" (however useful they might be). I didn't go far down that road though; I just think it is worth keeping in mind during subsequent read-throughs.

vsoch commented 4 years ago

Yeah I think you’re right, I probably never explicitly framed it like that because I was focused on the data science part. What were you thinking of trimming further (or have you not decided yet?)

nuest commented 4 years ago

Thanks @bdevans !

Re. trimming: @sje30 offered to give the paper a good read again, and @betatim @benmarwick and @psychemedia also will have to give the paper another look anyway, since it changed quite a bit since the first preprint. So I'd invite them to

  1. identify content that could be cut
  2. watch out where subheadings could be helpful

I suggest to merge this now and use a new PR to add the actual graphic, but let the others start reading before you have created the actual visual summary as a graphic.

@vsoch @bdevans fine by you?

vsoch commented 4 years ago

Fine by me! :ok_person:

bdevans commented 4 years ago

What were you thinking of trimming further (or have you not decided yet?)

There are probably quite a few little bits and pieces (too many to list) if I go back through e.g. like the point I removed about running docker system prune which is very good housekeeping but not so much to do with writing Dockerfiles or reproducibility.

I was just recommending that all co-authors keep their scalpels in hand and a laser focus as they reread it so we hopefully end up with a more concise and focussed document :)

vsoch commented 4 years ago

Ah interesting. I think that particular point on pruning is important to note - it's not technically labeled as a reproducibility thing, but if you don't do it (at least if I don't) your system can suddenly freeze up because you have too many containers. Although it's not perfectly in scope, that's the kind of tidbit I'd appreciate knowing in a paper like this, because it's relevant to my workflow and health of my computer.

bdevans commented 4 years ago

A useful tidbit indeed but it felt a bit too far off-piste for the paper to me (too small a hard disk or lack of RAM could cause similar problems but we needn't mention those in the article). However I'm happy to be overruled if people think it's better to keep it. I just try to edit on the basis that the more distantly related points we include, the more the main points will be drowned out.

Of course, this needn't be the end. I actually started a repo a year ago to collect notes on Docker for reproducible science. We could always write a follow-up paper of broader scope e.g. "Docker for (data) scientists: a rough guide"! 😄 🐳

bdevans commented 4 years ago

@nuest This PR is marked by you as "changes requested" but I can't see what that's referring to. Has they been dealt with or is there something you still want me to change?

nuest commented 4 years ago

Maybe we should save the content we remove in a tidbit.md here in the repo.