matthewfeickert / Behnke-Data-Analysis-in-HEP

Notes and exercises from Data Analysis in High Energy Physics, Behnke et al., 2013
MIT License
0 stars 0 forks source link

Binder not launching #1

Open DrAndiLowe opened 5 years ago

DrAndiLowe commented 5 years ago

Hi Matthew,

In case you need to know: the Binder for this repo doesn't launch. Moreover, the link to the build status 404's.

Cheers,

A.

matthewfeickert commented 5 years ago

@andrewjohnlowe Thanks for checking this out and for reporting this.

In case you need to know: the Binder for this repo doesn't launch

Yes, this is a known issue. I was a very early adopter to Binder and since then the Binder team has changed the Dockerfile requirements quite a bit. I haven't made the time to go back and update to modern Binder yet.

Moreover, the link to the build status 404's.

Yes, this is the same as the above.

I will probably fix this the next time I have some free time, but if you have interest in poking at this I can try to do it this week.

DrAndiLowe commented 5 years ago

@matthewfeickert No problem! Given the age of the repo, I wasn't sure whether this was being maintained and, therefore, whether this was useful info for you.

(Actually, the real reason for visiting was that I was poking around your repos to see if had an all-in-one Dockerfile for R + Anaconda + ROOT with fixed versions of each. I'm developing a polyglot workflow that I want to Binderize.)

matthewfeickert commented 5 years ago

@andrewjohnlowe Yes, this project that I haven't revisited in sometime, but I plan to at some point after I finish writing my thesis. :) So this information was still helpful — thanks!

to see if had an all-in-one Dockerfile for R + Anaconda + ROOT with fixed versions of each

So you're looking for an single image that has R, Conda, and ROOT installed in it, correct? Hm, I don't think I have this, but it should be easy to make if you want. Do you have any other specific requirements (such as base OS)?

DrAndiLowe commented 5 years ago

Well, I was just wondering whether someone had already done this, and what the best practices are for multi-stage builds. There are already images for R and Anaconda, and I guess someone must have done one for ROOT, but likely building from CERN Scientific Linux instead of Debian (which is used for the previous two). I can probably figure out how to build a single image with R + Anaconda + ROOT for my own use.

But would anybody else trust it enough to use it themselves? Trustworthiness is an issue. (There are images that contain malware, for example. As an aside, my current client is reluctant to use images built by a third party, exactly for this reason.) So I'm wondering if there is any moves within the HEP community to do something like the Rocker Project (https://www.rocker-project.org/), but for HEP, with some nice tools included. (If not, why not?)

My specific use-case is probably unusual; I expect most people are not doing polyglot workflows in a single HEP analysis.

matthewfeickert commented 5 years ago

likely building from CERN Scientific Linux instead of Debian (which is used for the previous two). I can probably figure out how to build a single image with R + Anaconda + ROOT for my own use.

@andrewjohnlowe Sure. I guess it depends on how you want to do this, but if as you already want Conda on the image then you could just install R and ROOT through Conda. If you really want to have things optimized for the image then you could of course install them from source, though of course with ROOT this is slightly more painful. :wink:

But would anybody else trust it enough to use it themselves? As an aside, my current client is reluctant to use images built by a third party, exactly for this reason.

I guess it depends. If this is something that you are building publicly for your client to use then I guess I would say they should trust that given the Dockerfile that you would provide they should feel confident that the largest security risk is the base image. If this is an image that is widely used then I think they can feel confident in using it if the build can just be rerun anytime a security patch is applied to the base OS image.

So I'm wondering if there is any moves within the HEP community to do something like the Rocker Project but for HEP, with some nice tools included.

It depends on what exactly you mean by "nice tools included". I'll tag @lukasheinrich here as he has been heavily involved in the production of them, but ATLAS has official Docker images (e.g., the images with AnalysisBase in them) and CERN has official OS images as well. @sbinet might also have thoughts as I know that he has also produced some nice HEP Docker images in the past.

My specific use-case is probably unusual

Sounds like a great use case. :)

DrAndiLowe commented 5 years ago

Interesting that ATLAS has official Docker images now. I guess my own definition of "nice tools" differs from others; I don't work on ATLAS any more, so I have no need for running Athena, for example. For me, "nice tools" means data analysis tools like R, Python, scikit-learn, ROOT/TMVA, and so on, that are going to make plots and numeric results for a paper. I assume the current situation in HEP is that there's no standardisation of execution environments to ensure reproducibility of results shown in papers.

I like the strategy of the Rocker Project for versions R environments: there are a number of images available with each layer adding functionality, so if you just want base R, there's an image for that, but if you need publishing-related packages or geospatial libraries, there are images for those. You pick what you need instead of getting everything including the kitchen sink. I'm not so familiar with the Python ecosystem, so I don't know if anything similar exists for it, nor how you can specify a package cohort with fixed versions.

sbinet commented 5 years ago

// insert rant about HEP using Go-based tools like GoHEP and the Go modules mechanism for versioned, (across space&time) reproducibility of results and binaries.

matthewfeickert commented 5 years ago

I assume the current situation in HEP is that there's no standardisation of execution environments to ensure reproducibility of results shown in papers.

I don't think so in the way that you're thinking. I'd point you towards REANA (c.c. @lukasheinrich, @tiborsimko) RE: ensuring reproducibility of results from analyses, but that's not what you're looking for as an end user outside of HEP.

I'm on the same page as @sbinet here and would suggest you checkout GoHEP.

In terms of

data analysis tools like R, Python, scikit-learn, ROOT/TMVA, and so on, that are going to make plots and numeric results for a paper

I think the closest thing we have to that at the moment are the Python machine learning environment Docker images that I've made for the ATLAS machine learning forum which have a full Python3 machine learning environment in them and then some additionally have an ATLAS AnalysisBase environment that gives ROOT.