ohbm / osr2020

Website for the Open Science Room at the OHBM 2020 meeting
https://ohbm.github.io/osr2020
Other
18 stars 6 forks source link

Past, Present and Future of Open Science (Emergent session): Containers: Ticket to Valhalla or Ticket to the Inferno? #87

Open jsheunis opened 4 years ago

jsheunis commented 4 years ago

Containers: Ticket to Valhala or Ticket to the Inferno?

By David Kennedy, University of Massachusetts Medical School

Abstract

The containerization of neuroimaging analysis workflows has quickly become a hot topic in the OSR and beyond. But with great power comes great responsibility. Containers sometimes get presented as the 'end all and be all' by some and as a 'dangerous bandaid for masking bad software development practices' by others. What's the poor researcher to do? In this session we hope to have a pleasant discussion of the pros and cons, useful application areas, and practical logistics about using containers in the 'real world'. We propose to present this as a round table with input from a number of perspectives, then followed by a dialog and public discussion aimed at determining where the community stands regarding 'best practices' and use of containers. The round table may include (subject to confirmation and further discussion): Jo Etzel, Pierre Bellec, Peer Herholz, Satra Ghosh, Agah Karakuzu.

Useful Links

https://github.com/ReproNim/neurodocker https://ww5.aievolution.com/hbm1901/index.cfm?do=abs.viewAbs&abs=4639

Tagging @dnkennedy

dnkennedy commented 4 years ago

A bunch of good points above. 1) scale. I collapsed the (larger and smaller) scale into the section on Software Consumers of various scale, so that scale dimension can still be explored... @yarikoptic 's additional lovely points may need a whole additional emergent session to really get to. But, to the extent that these are some of the pointers to some of the 'good' of containers, make sure you get them into the 'good' column of the 'scoreboard'!

dnkennedy commented 4 years ago

Hi @gllmflndn @GaelVaroquaux @ValHayot @hcp4715. Please confirm that you're on board with this plan, and you have the zoom info. Sorry for the chaotic communication, too many channels of communication for my small internet-less brain...

ValHayot commented 4 years ago

Yep! works for me / got the email.

raamana commented 4 years ago

Another point to be considered, that I noted in mattermost last week and that is very important IMHO, is to prioritize numerical and algorithmic stability/reproducibility as the first resort to achieving reproducibility. When possible (might not always be), this would return better bang for the back IMHO, over "nuking" the app with tons of layers of containers (even if one doesn't see them), that adds to the complexity of the app as well as difficulty in usage.

My experience with BIDS-App OPPNI and graynet/hiwenet partly contributed to the above point of view. Looking back, I feel standardizing HPC environments with the same stack would save a ton of effort and money, which moving the science forward. Just my 2¢.

gllmflndn commented 4 years ago

@dnkennedy Yes, got your email, thanks! Time is not ideal for me so if I miss the beginning of the session or lurk in the background, just skip me - or I'll try to send you a short summary of some of my thoughts on the topic.

dnkennedy commented 4 years ago

Hi. In an above comment I put an incorrect link to the 'scorecard'. I corrected it above, but am repeating the correct 'scorecard' link here: https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. Apologies to anyone who tried that above link and didn't get let into an internal doc that was just my reconstruction of the Mattermost /Town hall container discussion thread before it moved to to the containers channel.

dnkennedy commented 4 years ago

OK, @gllmflndn Would love you input and thoughts regarding the containerization of all things SPM and beyond... Either in person, or at least in the scorecard doc (https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0).

gllmflndn commented 4 years ago

@dnkennedy Thanks, just seeing the scorecard now - to be honest, my thoughts (and beyond) seem to be nicely covered by @satra and @GaelVaroquaux.

dnkennedy commented 4 years ago

@gllmflndn It's ok to reiterate a little, that way we effectively "+1" some of the common topics that are important to multiple folks. If it's easier, I guess you can annotate the other points with a "+1" in some other way...

dnkennedy commented 4 years ago

Hello again. Yesterdays session was the 'fun' part. Now, the 'hard' work starts of trying to sift and consolidate the raw observations, in order to see what came out. Any good ideas about how to proceed? Can we get volunteers to take a column each (C (Good), D (Bad), E (Problems), F (Solutions)) to distill into a bullet list of points (with a counter of how many times a similar thing came up)? [vertical integration]. Then we can follow that up with a horizontal integration...

yarikoptic commented 4 years ago

And also make it available for comments. E.g. although I agreed with @GaelVaroquaux about "Encourage bad behavior from tool developer perspective (not worrying about portability, dependences)" I later reconsidered it: I saw many projects where trying to create a Dockerfile lead developers to realize shortcomings of their build process/infrastructure and have them addressed. So it is again the stick of two ends and not all "black and white".

Starborn commented 4 years ago

Hello Like with other sessins this week I did not participate but remain very interested to learn what was said, Please share the summaries! Look forward PDM

On Thu, Jul 2, 2020 at 11:56 PM Yaroslav Halchenko notifications@github.com wrote:

And also make it available for comments. E.g. although I agreed with @GaelVaroquaux https://github.com/GaelVaroquaux about "Encourage bad behavior from tool developer perspective (not worrying about portability, dependences)" I later reconsidered it: I saw many projects where trying to create a Dockerfile lead developers to realize shortcomings of their build process/infrastructure and have them addressed. So it is again the stick of two ends and not all "black and white".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ohbm/osr2020/issues/87#issuecomment-653090727, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKUCLJTAB5LFF2RH33AYDRZSU4FANCNFSM4OIPKFCQ .

dnkennedy commented 4 years ago

Hi @Starborn ; the raw notes from the session are at https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. The whole community is invited to help refactor these raw notes into a more coherent set of observations and then a more formal 'best practices' recommendation.

Starborn commented 4 years ago

Thank you this is great will sure follow up

On Fri, Jul 3, 2020 at 9:53 AM David Kennedy notifications@github.com wrote:

Hi @Starborn https://github.com/Starborn ; the raw notes from the session are at https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. The whole community is invited to help refactor these raw notes into a more coherent set of observations and then a more formal 'best practices' recommendation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ohbm/osr2020/issues/87#issuecomment-653290746, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKUCLAP74JUGW5XOQYEWTRZU2YNANCNFSM4OIPKFCQ .

robertoostenveld commented 4 years ago

A thought that stuck with me following the online discussion was

If all the people that are spending time on making FreeSurfer (*) containers would contribute a bit to improving FreeSurfers release/packaging/installation/deployment/infrastructure mechanisms, would that not be much more effective?

(*) you can insert your favourite software here instead of FreeSurfer, but it was one that was explicitly mentioned

I think that for many computer scientists is more interesting to spend the time on "your own" software/container than on someone else's open-source project. This reflects a problem with the academic incentive structure, which does not favour contributions to "someone else's" projects or software. The same problem would not only apply to analysis software, but also to the containers from other people.

dnkennedy commented 4 years ago

This sentiment resonates with me. Is that to say, there really should only be 1 FreeSurfer 6.0 container (again, taking a 'random' example), and it should live in some well known standard place, and everyone should use that unless there is a really good reason to make an new FreeSurfer 6.0 container, then fine, document why, and put it in a standard place?

satra commented 4 years ago

even for freesurfer there are many use cases: neurodocker distributes a minimized freesurfer just for recon-all while most of these big packages have many needs. the freesurfer group themselves now release a version of freesurfer as a whole container.

yes, whole installations can (and are) be(ing) distributed by people who develop the software. but there are many use cases for container construction (e.g., fmriprep, giraffe.tools, optimize size for running/shipping).

take a look at the ga4gh registry of containers to see what can be done to help users. i think in this area they did a really good job: https://dockstore.org/