greenelab / computational-reagents

Rigor, Reproducibility, Transparency, and Reagent Validity for Computational Biologists
Creative Commons Attribution 4.0 International
2 stars 7 forks source link

Docker #6

Open gwaybio opened 7 years ago

gwaybio commented 7 years ago

Docker is a tool to containerize and version an entire base compute environment. It acts as a virtual machine that can be shipped to users in conjunction with code/software to ensure that the underlying base image is consistent.

It is not sufficient to ensure reproducibility however, since often times the image will require external packages that need to be downloaded with each pipeline run. It is extremely helpful however and can be used with any analysis across operating systems.

oryoruk commented 7 years ago

I have never used Docker. From what I read, it sounds like the ultimate reproducibility tool.

I am curious about others' thoughts on when Docker is an overkill, and when it is a must.

p.s. Surprised to read that it's not handling external packages.

gwaybio commented 7 years ago

I am curious about others' thoughts on when Docker is an overkill

In my mind its necessary in almost every scenario to ensure ultimate reproducibility. In some smaller projects written in R and/or python, a combination of checkpoint to manage CRAN packages and anaconda environments would be sufficient. I think its always safer to save a docker image in addition though...

Surprised to read that it's not handling external packages

It handles most - I think what I meant here was that I have personally come across some packages that gave me issues for one reason or another or versioned data wasn't available. This has been by far the exception to the rule.

bemert commented 7 years ago

Wow! Looks extremely "open". How does it deal with commercial software e.g. Matlab? Do you think certain journals should use this in their review process to ensure transparency and reproducibility?

apexamodi commented 7 years ago

I think Docker is a great way for researchers to share tools they've created in a reproducible manner since you don't have to worry about the particular machine you're using. I do recall having trouble getting it to work once, and think that while in theory its a great idea, the requirement of external packages makes it annoying to use. I think the more complicated a method the more useful/necessary the docker image.

sklasfeld commented 7 years ago

I have downloaded Docker and plan to use it. However, there is a learning curve that I still have to go through. For example, they have the link to download it and the tutorial on the same page. Therefore, you download it while reading the tutorial. In the tutorial they explain what you should do before downloading it. This made me confused. For the tutorial, it would be nice to have a clear example of what Docker is used for. Instead, it starts off by giving ways to check if it is running. I just want to know what it is. In other words, this tool may be good, but it needs to improve its demo.