ohbm / osr2020

Website for the Open Science Room at the OHBM 2020 meeting
https://ohbm.github.io/osr2020
Other
18 stars 6 forks source link

Past, Present and Future of Open Science (Emergent session): Containers: Ticket to Valhalla or Ticket to the Inferno? #87

Open jsheunis opened 4 years ago

jsheunis commented 4 years ago

Containers: Ticket to Valhala or Ticket to the Inferno?

By David Kennedy, University of Massachusetts Medical School

Abstract

The containerization of neuroimaging analysis workflows has quickly become a hot topic in the OSR and beyond. But with great power comes great responsibility. Containers sometimes get presented as the 'end all and be all' by some and as a 'dangerous bandaid for masking bad software development practices' by others. What's the poor researcher to do? In this session we hope to have a pleasant discussion of the pros and cons, useful application areas, and practical logistics about using containers in the 'real world'. We propose to present this as a round table with input from a number of perspectives, then followed by a dialog and public discussion aimed at determining where the community stands regarding 'best practices' and use of containers. The round table may include (subject to confirmation and further discussion): Jo Etzel, Pierre Bellec, Peer Herholz, Satra Ghosh, Agah Karakuzu.

Useful Links

https://github.com/ReproNim/neurodocker https://ww5.aievolution.com/hbm1901/index.cfm?do=abs.viewAbs&abs=4639

Tagging @dnkennedy

PeerHerholz commented 4 years ago

tagging @satra, @dnkennedy, @jaetzel and @pbellec as interested folks based on the Mattermost thread. Did I miss anyone/who else should be tagged?

gllmflndn commented 4 years ago

Great! Don't forget @stebo85

PeerHerholz commented 4 years ago

Thx @gllmflndn, was also thinking about @stebo85, but wasn't sure if he could make it given the time zones! @stebo85 is a there a time that would work for you? Also saw that I forgot @agahkarakuzu, sorry.

pbellec commented 4 years ago

LGTM. I think Gael Varoquaux has pretty strong opinions about 'dangerous bandaid for masking bad software development practices' (could not locate the link, but I think he wrote a blog post about that a while back).

It would also be great to have someone speak about reproducibility and containers. Maybe Valerie Hayot?

I am happy to be dropped from the discussion, as I don't think I have expertise not covered by others. I guess I could play the role of devil's advocate, as I am not sold on the utility of containers as a software distribution tool.

PeerHerholz commented 4 years ago

tagging @ValHayot, thx @pbellec.

gllmflndn commented 4 years ago

Please stay @pbellec, we need a diversity of opinions!

Searching for Gael's blog post, I found this: http://gael-varoquaux.info/programming/of-software-and-science-reproducible-science-what-why-and-how.html http://ivory.idyll.org/blog/2014-containers.html

PeerHerholz commented 4 years ago

tagging @GaelVaroquaux to check if he would be interested and has bandwidth to stop by.

GaelVaroquaux commented 4 years ago

Happy to complain :).

When exactly do you need me?

PeerHerholz commented 4 years ago

thx @GaelVaroquaux, the rock of complains, hehe! Also tagging @hcp4715 who did a lot of work to introduce containers in his lab/institute and certainly has important and interesting points to add.

hcp4715 commented 4 years ago

@PeerHerholz, FYI, yesterday we have an excellent master student in China wrote a Chinese tutorial that covered the whole process from installing docker, to using heudiconv, and running fmriprep, in both Linux and Windows (you can imagine the frustrations he had experienced ;)). We put is on OSF: https://osf.io/naxgd/

ValHayot commented 4 years ago

Hey, thanks for the tag. While I do have opinions, I'm no expert on the matter. I'll tag @gkiar and @ali4006 who have done extensive work on this.

I'll try to listen in though :)

PeerHerholz commented 4 years ago

Thx @hcp4715, cool! Following up on our conversation in Mattermost: we need a diverse set of experience levels, use cases and backgrounds in order to create a fruitful discussion. So far I think the following have been mentioned (thx @emdupre):

Please discuss and add further groups!

emdupre commented 4 years ago

It looks like there's already a pretty clear mapping:

If you want to add more, then I'd aim for the slots with only one person. But not sure how big you're envisioning this !

dnkennedy commented 4 years ago

Any comments on time for this? While there are still a number of open slots? Wednesday or Thursday? 5am, 2pm, 3pm, 9pm, 10pm EDT?

jaetzel commented 4 years ago

All but the 5 am EDT slot are ok for me, either day. 5 am EDT is possible, but very early for me.

Any comments on time for this? While there are still a number of open slots? Wednesday or Thursday? 5am, 2pm, 3pm, 9pm, 10pm EDT?

hcp4715 commented 4 years ago

Any comments on time for this? While there are still a number of open slots? Wednesday or Thursday? 5am, 2pm, 3pm, 9pm, 10pm EDT?

5 am, 2 pm, and 3 pm EDT works for me (time zone CEST).

satra commented 4 years ago

wed: 2,3,9 EDT thu: 9 EDT

GaelVaroquaux commented 4 years ago

wed, 2 and 3pm EDT?

PeerHerholz commented 4 years ago

No preference on my side, all times would work for me!

stebo85 commented 4 years ago

5:00 am, 9pm and 10pm work for us in Australia :)

Starborn commented 4 years ago

Goodmorning from Asia Paciric-

I am marginally interested in containers, in the sense that I had not used them before coming to Neuroscience, and my experience is surely they are great in principle in terms of containing the full data and code, they are not user friendly, and can become an end to themselves. When I saw it running - docker reminded me of Cobol and it felt like jumping thirty years back, before Visualbasic. It was painful for me to see hours, if not days weeks and months of people in the lab trying to learn docker, which would not install, or be awkward. User guides and tutorials exist aplenty yet people said they wanted to write a tutorial. Was a bit of a waste of time and felt more like using docker, getting stuck writing docker tutorials had become a fashionable core activity which was sidetracking resouces that should be devoted to neuroscience instead.

So my contribution to this discussion if you can take it on board is that containers could be developed to be less geeky, more stable, less brittle - discussed higher-end interfaces with Datalad panel for example- That good practices should include multiple methods/tools to carry out experiments, ultimately a researcher should not be forced to use a container if they find it awkward, alternatives should be sought and developed

P

On Fri, Jun 26, 2020 at 7:34 AM Paola Di Maio paola.dimaio@gmail.com wrote:

On Fri, Jun 26, 2020 at 7:09 AM Steffen Bollmann notifications@github.com wrote:

5:00 am, 9pm and 10pm work for us in Australia :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ohbm/osr2020/issues/87#issuecomment-649861538, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKUCJXXO5XKDN7OHWGSZDRYPKITANCNFSM4OIPKFCQ .

dnkennedy commented 4 years ago

With huge apologies to the APAC time zone, due to the voting above, Wed 1 Jul 2020 7pm - 8pm (GMT) (Wed 7/1/2020 3:00 PM - 4:00 PM) is the time slot I requested. Can we come up with a way to extend the conversation (more or less formally?) to include the APAC later in the day, perhaps seeded by the initial discussions in the above time slot? I am sorry, this was one of the hardest part of agreeing to be the abstract submitter :-(

jaetzel commented 4 years ago

@Starborn, your comments ring so true for me, both the initial reaction at seeing what accessing a container actually looks like, and the amount of time involved. Even the impulse for more and more tutorials ...

satra commented 4 years ago

pasting from mattermost.

a few questions to consider for discussion

it may also be useful to create and evaluate a set of polls prior to the discussion

guiomar commented 4 years ago

Thanks for organizing this!! Do you know if they can also be efficiently used with matlab code?

yarikoptic commented 4 years ago

I would be a happy defense attorney for containers, hammers, chainsaws, and any other useful tool or tech!

yarikoptic commented 4 years ago

@guiomar :

Do you know if they can also be efficiently used with matlab code?

"Efficiently" - not sure. But you can just place matlab inside and then expose license from outside. I know that

agahkarakuzu commented 4 years ago

@yarikoptic @guiomar another approach is to use Matlab Compile Runtime (MCR) in containers with the compiled application. Here is the example from my project's release pipeline:

https://dev.azure.com/neuropoly/qMRLab/_releaseProgress?_a=release-pipeline-progress&releaseId=55

In terms of performance, there is usually a bit latency given that MCR needs to be started per call (command line). But for GUI based applications this is not an issue, I containerize qMRLab and expose GUI through x11 forwarding. Which unlocks a feature that is not provided by Octave without license. +1 for containers :)

I noticed that spikeforest used MCR as well.

gllmflndn commented 4 years ago

I also use the MATLAB Runtime for SPM but if you want to build a container with a MATLAB license, the 'official' instructions are here: https://github.com/mathworks-ref-arch/matlab-dockerfile

civier commented 4 years ago

Hello all. Will it be Wed 1 Jul 2020 7pm - 8pm GMT or BST? as I saw Britain is moving to daylight savings.

Starborn commented 4 years ago

Thanks for this discussion and fore reporting the excerpt for mattermost I am currently running short of eyes as discussed with Oren briefly, I welcome the browser based navigation for container (which I hope to dumb test soon) and I d like to call for a cloud based container (so that it does not need to be installed/configured) that could deliver the best of both worlds If someone is up for that I would love to work on the GUI P

On Sat, Jun 27, 2020 at 6:59 AM Oren Civier notifications@github.com wrote:

Hello all. Will it be Wed 1 Jul 2020 7pm - 8pm GMT or BST? as I saw Britain is moving to daylight savings.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ohbm/osr2020/issues/87#issuecomment-650439974, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKUCOKHNEOWB6L6CAT533RYUR5NANCNFSM4OIPKFCQ .

Starborn commented 4 years ago

ah, containers in the cloud talk https://aws.amazon.com/containers/

On Sat, Jun 27, 2020 at 8:32 AM Paola Di Maio paola.dimaio@gmail.com wrote:

Thanks for this discussion and fore reporting the excerpt for mattermost I am currently running short of eyes as discussed with Oren briefly, I welcome the browser based navigation for container (which I hope to dumb test soon) and I d like to call for a cloud based container (so that it does not need to be installed/configured) that could deliver the best of both worlds If someone is up for that I would love to work on the GUI P

On Sat, Jun 27, 2020 at 6:59 AM Oren Civier notifications@github.com wrote:

Hello all. Will it be Wed 1 Jul 2020 7pm - 8pm GMT or BST? as I saw Britain is moving to daylight savings.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ohbm/osr2020/issues/87#issuecomment-650439974, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKUCOKHNEOWB6L6CAT533RYUR5NANCNFSM4OIPKFCQ .

yarikoptic commented 4 years ago

@Starborn

cloud based container (so that it does not need to be installed/configured)

you might be interested in our wip presented at the end of our recent webinar on version control, containers and reproducibility: https://m.youtube.com/watch?v=ix3lC6HGo-Q

stebo85 commented 4 years ago

With huge apologies to the APAC time zone, due to the voting above, Wed 1 Jul 2020 7pm - 8pm (GMT) (Wed 7/1/2020 3:00 PM - 4:00 PM) is the time slot I requested. Can we come up with a way to extend the conversation (more or less formally?) to include the APAC later in the day, perhaps seeded by the initial discussions in the above time slot? I am sorry, this was one of the hardest part of agreeing to be the abstract submitter :-(

@dnkennedy @jaetzel - 5am in Brisbane is not too bad! Happy to join the discussion and show how we approach containers at the Centre for Advanced Imaging at the University of Queensland and make it work for many of our users who don't even realize that their analyses are running in containers :)

Starborn commented 4 years ago

Goodmorning Thank you @yarikoptic for the vid @jaetzel suddenly containers (learning them, getting them to work etc) has become central to brain science, thats a bit of a problem for many @stebo85 please share your magic!

stebo85 commented 4 years ago

@Starborn, yes - happy to show more in the roundtable, but in a nutshell we provide Image processing workstations for our users where we use transparent singularity (https://github.com/CAIsr/transparent-singularity) to provide neuroimaging software. Users don’t have to worry about the underlying singularity containers and can just use all tools as if they were installed natively. In addition we contribute to a project called “Characterisation Virtual Laboratory” (https://www.cvl.org.au/) where we again use containers to deliver neuroimaging software - these CVL instances run on clusters across Australia and are easily accessible via a browser based interface and again users don’t have to worry about the containers at all. Finally, since not everyone has access to our imaging workstations or the CVL desktops, @civier proposed to bring this technology to the desktops of end-users and we started the VNM project (https://github.com/NeuroDesk/vnm) last week during the OHBM Hackathon that provides simple access to a linux desktop via the browser where we combine the tools developed in CVL and transparent singularity with the aim to provide a good user experience :)

Starborn commented 4 years ago

Thanks, looks great but my org is not in the list, is there any way to get an account? I am not in Australia but my lab is on Austronesian land across the Pacific myself, if that makes a difference so you provide software right? what about the data integration from data platform? (daydreaming?) huh

On Sun, Jun 28, 2020 at 10:34 AM Steffen Bollmann notifications@github.com wrote:

@Starborn https://github.com/Starborn, yes - happy to show more in the roundtable, but in a nutshell we provide Image processing workstations for our users where we use transparent singularity ( https://github.com/CAIsr/transparent-singularity) to provide neuroimaging software. Users don’t have to worry about the underlying singularity containers and can just use all tools as if they were installed natively. In addition we contribute to a project called “Characterisation Virtual Laboratory” (https://www.cvl.org.au/) where we again use containers to deliver neuroimaging software - these CVL instances run on clusters across Australia and are easily accessible via a browser based interface and again users don’t have to worry about the containers at all. Finally, since not everyone has access to our imaging workstations or the CVL desktops, @civier https://github.com/civier proposed to bring this technology to the desktops of end-users and we started the VNM project ( https://github.com/NeuroDesk/vnm) last week during the OHBM Hackathon that provides simple access to a linux desktop via the browser where we combine the tools developed in CVL and transparent singularity with the aim to provide a good user experience :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ohbm/osr2020/issues/87#issuecomment-650674789, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKUCLTQBXUGTRJVTKHABLRY2T4TANCNFSM4OIPKFCQ .

stebo85 commented 4 years ago

@Starborn, I believe you need to have a collaboration with an Australian institution to get access to a cluster running CVL. Data integration into these platforms is crucial but not easy: at the University of Queensland we also make this seamless to users enabled by our underlying data management fabric, called MeDiCI (https://rcc.uq.edu.au/data-storage) - it’s a system consisting of multiple GPFS caches that automatically transports the data to the right location.

In VNM we are planning to support multiple ways of getting your data, but that is all work in progress and not ready yet. We are planning to integrate datalad and at the moment you can already access your data on the local disk via a mount point. Would be great to hear what your specific application and use case is to see if we can soon enable that. Please feel free to open an issue on https://github.com/NeuroDesk/vnm, describe exactly what you would like to do and maybe we can get this working pretty quickly.

Starborn commented 4 years ago

Thank you- such great work! I should first get an account, I ll start looking for a suitable program that I may be able to join remotely will synch up after that PDM

On Sun, Jun 28, 2020 at 12:07 PM Steffen Bollmann notifications@github.com wrote:

@Starborn https://github.com/Starborn, I believe you need to have a collaboration with an Australian institution to get access to a cluster running CVL. Data integration into these platforms is crucial but not easy: at the University of Queensland we also make this seamless to users enabled by our underlying data management fabric, called MeDiCI ( https://rcc.uq.edu.au/data-storage) - it’s a system consisting of multiple GPFS caches that automatically transports the data to the right location.

In VNM we are planning to support multiple ways of getting your data, but that is all work in progress and not ready yet. We are planning to integrate datalad and at the moment you can already access your data on the local disk via a mount point. Would be great to hear what your specific application and use case is to see if we can soon enable that. Please feel free to open an issue on https://github.com/NeuroDesk/vnm, describe exactly what you would like to do and maybe we can get this working pretty quickly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ohbm/osr2020/issues/87#issuecomment-650689739, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKUCKMSIVFXSXPR2BP6R3RY26WNANCNFSM4OIPKFCQ .

civier commented 4 years ago

Hello All, Just to clarify the VNM (https://github.com/NeuroDesk/vnm) runs equally well on workstations or on the cloud. We are also looking into making it work on HPC, though there are several technical challenges with that. If anyone has experience with nesting Singularity containers, please get in contact with me at orenciv@gmail.com Oren

satra commented 4 years ago

@dnkennedy - was the time decided? and are there any todo's?

complexbrains commented 4 years ago

This event has been scheduled to be run on 01.07.2020, 19:00- 20:00 UTC

For more information, please go to https://ohbm.github.io/osr2020/schedule/emea

dnkennedy commented 4 years ago

OK, for better or worse, I've tried to distill what and how we might present this OSR Containers session. We have a handful of invited folks that cover a variety of application areas (software developers, container developers, consumers, and educators). Each presenter gets 4 minutes to briefly say something about the 'good', or the 'bad' or the 'good but difficult' issues with using containers in the 'their real world'. We will keep a time clock, and a scoreboard: https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. I then want to also open it up to the rest of the community for similar 4 minute statements about their good/bad/problem containers issues in their world. This will be about collecting these issues, not solving them (we do not have enough time here to argue/solve/discuss at much length the details of any of the issues themselves). With these proceedings, I posit that we can then, as a community (off line), attempt to develop a document along the lines @satra suggested and by way of doing that, discuss/argue/debate/resolve (I hope) the details of the various issues. Of course, having @satra 's points of discussion in mind can influence what good/bad/problem anyone brings up, but addressing those directly,I think, is too far reaching for a 1-hour session with community involvement...

satra commented 4 years ago

@dnkennedy - sounds like a plan! using the forum to listen to and aggregate different viewpoints would indeed be a great starting point.

dnkennedy commented 4 years ago

@satra Any TODO's you ask? Well, if folks can tolerate the design I put together, we need to promote the session and make sure that those who will be speaking know the 'ground rules' and scope. Some questions remain: should we let the speakers pre fill in their bullet points on the 'scoreboard'? I think there will only be one shared screen (mine) which can just be the 'scoreboard' with the community filling it in as we go. It MIGHT be possible for a speaker to provide me 1 slide or webpage that I could display.

dnkennedy commented 4 years ago

The @satra post-session community white paper topics for discussion (as alluded to above):

dnkennedy commented 4 years ago

So, do @GaelVaroquaux @PeerHerholz @gllmflndn @satra @ValHayot @stebo85 @hcp4715 @jaetzel and @pbellec consent to the plan outlined above and in https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0?

scheduled to be run on 01.07.2020, 19:00- 20:00 UTC

I have to provide email addresses to the OSR to get the Zoom links sent to ya'll. I will be sharing my screen, with the 'scoreboard' of the aforementioned google spreadsheet.

You can share with me 1 slide or one URL that I can try to show during your 4 minutes! Please stick to the 4 minutes, I will be draconian. Please try to stick to the enumeration of (any of the) good things, bad things, problem things and solution things about containers. You can pre-fill in the aforementioned scoreboard/spread sheet, if you want. These are all longer discussions for the future, in this session we are collecting... Feel free to also share links to other presentations, tools, resources that you want, even though we can not get into their details in this session's format.

PeerHerholz commented 4 years ago

One important aspect I think that's missing is the scale, as outlined by @jaetzel and @emdupre in the Mattermost channel: on what project scale should/could/must containers be used? Something along the lines of: lab - institute/center - multisite - consortia and single publication - multi publication - software package with dataset intended for eventual public use somewhere in there. Re community feedback: a tweet/mattermost message should be sent out asap so that folks can gather and prepare information and their points.

yarikoptic commented 4 years ago

One aspect which containers facilitate is standardization of the application interfaces: BIDS-Apps, Flywheel gears, Brainlife ABC apps, Boutiques; and even more generic Singularity SCI-F Apps (harmonization of entry points within single container). Although such APIs can be used without containers, IMHO abstraction away from "software distribution" aspect helped to concentrate on APIs, and now they are typically used only with the containers. I think some exposure to those and discussion on possible ways to improve interoperability (and metadata harmonization to facilitate discovery between associated platforms) would be a valuable topic.

yarikoptic commented 4 years ago

Questions concentrate around "research", but many participants and audience will also be "scientific software developers". So discussion of aspects related to software development where containers provided huge assistance IMHO is a worthwhile topic: use of containers for troubleshooting/debugging, continuous integration, etc.