matthewfeickert / HEPML-env

A minimal Python3 environment for HEP machine learning with pipenv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Scope #10

Closed dguest closed 6 years ago

dguest commented 6 years ago

At some point I want to understand the scope of this project.

I think we're pretty clear that the goal is to install a recent version of python 3 and some machine learning packages that we think are useful as a baseline for HEP things. The open questions:

I think there are also a few things that we can agree not to support:

matthewfeickert commented 6 years ago

At some point I want to understand the scope of this project.

Yeah, sorry this hasn't been more transparent so far. I think that once I add some basic information to the README this will be more clear, but I wanted to first get the minimal amount of infrastructure in. We can talk in person about this as well.

I think we're pretty clear that the goal is to install a recent version of python 3 and some machine learning packages that we think are useful as a baseline for HEP things. The open questions:

  • Should the packages be limited to ones where binary wheels exist? I would generally say "yes" because it's faster, but maybe I'm missing something.

I don't think that this is really that big of a deal so I would say don't make a rule on it. I understand the desire to get up and running with minimal latency, but in my mind having an environment that works is the most important thing and if that requires building from source and getting a coffee that isn't too big of a deal. However, I think that as pip always defaults to getting the wheel of a library if it is available there will be few cases in which we need to build from source.

  • What architectures are supported? I'm fine with dropping support for SLC6 for now, since we have lxplus7 and (in general) we shouldn't try to support operating systems from the stone age.

If you're on LXPLUS then CentOS 7. If you're on a local machine or another cluster then we should make it clear what are the expected dependencies in terms of g++ and such. This can go in the docs.

  • What dependencies are there. It looks like it's "just" /cvmfs/, but that's a safe bet (otherwise it's not really "hep" software...)

If you're on LXPLUS /cvmfs/. My goal was to make the ability to setup this environment as machine agnostic as possible (once you pass the minimal thresholds). For reproducibility we should have it such that if /cvmfs/ is available then it should be defaulted to, but I don't think that it should be a true dependency. As a follow up, the reason I called this "HEPML-env" is to type hint: people should be thinking "this is how I can setup a ML env that would be standard for HEP" (e.g., also including uproot), not that "this is an ML env that requires HEP software". Thoughts?

I think there are also a few things that we can agree not to support:

  • ROOT, other than uproot (I think this alone will make our life 4 times easier)

100% agreed. As you've said, we just use uproot, which also eliminates any need for rootpy and root-numpy.

  • Python 2

100% agreed. This can all be made clear to readers in docs, but this is already encoded in the Pipfile.