Comparisons of Spack and EasyBuild

spack / spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

https://spack.io

Other

4.24k stars 2.25k forks source link

Comparisons of Spack and EasyBuild #2115

Closed citibeth closed 6 years ago

citibeth commented 7 years ago

Kenneth Hoste kenneth.hoste@gmail.com 4:35 PM (16 hours ago)

to elizabeth.fisc., Spack, Todd Hi all,

Congrats on this nice milestone!

EasyBuild just passed the 1k mark with the previous release (v2.9.0, 20160923), but that is not an entirely fair comparison since that count doesn't include all the Perl/Python/R packages we install as 'extensions' as a part of the installation of the 'host' language or the (some of) X11 stuff which we install as a single bundle now; including those, we currently have about 1800 supported software packages.

In any case, Spack is clearly catching up fast in that regard...

I like the comparison by GitHub stars, we clearly need to be more consistent in making people 'like' the EasyBuild repos. ;-)

I've briefly discussed this with Todd already, but I think a good overview page comparing EasyBuild and Spack (and maybe also 'similar' projects like Anaconda) in terms of features and focus would be really useful to the community... This could be part of the Spack and/or EasyBuild documentation, or it could be hosted separately on 'neutral' ground (while making it clear the comparison is endorsed by both communities).

Anyone up for helping out with that? Elisabeth: maybe you're interested, since you've been involved with both projects up close?

regards, Kenneth

@tgamblin

citibeth commented 7 years ago

I would suggest the following points (with a [S] or [EB] indicating which choice I prefer from a tech point of view):

The concretization algorithm is both central to Spack and unique among auto-builders. Spack can in principle be used to generate recipes for auto-builders that do not do concretization (EasyBuild, for example), eg: https://github.com/llnl/spack/issues/1155 [S]
Spack's RPATH support is really nice. However, it is incomplete, as it only works for binary packages. There are ideas on how it could work for Python and other packages. [S]
Easybuild is more tightly integrated with the modules system (Lmod) than Spack (environment-modules). Spack does not do recursive modules; however, the spack module loads --dependencies command generates module load scripts with similar effect. [EB: I think that Spack will need to become more tightly integrated over time]
Spack is easier to install; and edit and submit PRs once installeld. However, real-life installations of Spack can require a bootstrapping process. [S: Even the Spack bootstrap is easier than installing EB; because you have Spack to do it!]
EB builds a module for itself, Spack does not. [EB]
Spack comes in one git Repo, EB in 3. [S: Dealing with 3 repos is 3x as hard]
EB has far more recipe superclasses (EasyBlocks) than Spack. Spack used to have none; now it has one superclass per build system (CMake, Autotools, Makefile). [S: A lot of stuff in EasyBlocks just goes in recipes in Spack, if it's for a one-off package; say, NetCDF. Eding a single recipe file is easier than editing an easyblock + recipe]
Spack recipes are more powerful, and closer to standard Python, than EB recipes. No need for separate easyblocks [S]
I don't know how EB deals with Python these days. With Spack, you can just load all the Python modules you need (correct, but you end up with a long $PYTHONPATH). Or you can do spack activate (which breaks Spack in a core way, i.e. no more combinatorial versioning). I prefer to generate load scripts for the Python modules I need, and avoid Spack activate.
Spack has developer support in the form of spack setup --- to configure a CMake build based on Spack's recipes [S]

Should definitely also compare with Conda and PIP. Some interesting things here:

conda-forge has a separate git repo for each recipe. I'd like to better understand the pros and cons of that choice.
Conda assumes an existing Python base (that you want to use for your apps). It does not install Python.
PIP can only install Python packages.

boegel commented 7 years ago

@citibeth that looks like a good start!

I'd like to see this fleshed out in table form ideally, something like:

aspect	EasyBuild	Spack
RPATH support	(WIP)	supported

...

Each aspect should then link to a subsection of the same page which provides more details. That would enable to get a good idea at first glance.

It's not entirely clear to me where this should be hosted, but a separate GitHub repository ("neutral ground") makes sense to me, where we could apply updates via PRs that have to be approved by the other 'party' before the live comparison is updated. Another option is to include the comparison in the EasyBuild/Spack documentation, but then we have two places to keep in sync, which would be a PITA.

Maybe we should focus on a handful (3? 5?) of aspects first to figure out the formatting? And then take it from there....

I would avoid marking a winner on each aspect, since in some cases it may not be very clear which project does the better job, I feel it sometimes matter on perspective for certain things. The strict dependency versions in EasyBuild are certainly a nuisance in some sense, but are actually perceived as an advantage as well since it facilitates reproducibility.

I disagree on your statement that Spack is easier to contribute to once installed. I understand your point that having everything in a single repository simplifies things, that's certainly the case.

However, EasyBuild provides very nice integration with GitHub (http://easybuild.readthedocs.io/en/latest/Integration_with_GitHub.html), and people can contribute new easyconfigs and upload test reports for them without knowing anything about git (which is a significant hurdle for some people). You can hardly claim than running a single eb command to contribute something is harder than running 4-5 git commands following by using the GitHub interface (sure, if you know where to go, it's trivial, but lots of people are (still) new to it)...

One thing that is definitely missing in your list: 'flexibility' (although that may be too generic a term). EasyBuild provides more flexibility in terms of controlling its behaviour than Spack does (to the best of my knowledge); for example support for custom module naming schemes (incl. a hierarchical module naming scheme), a plethora of configuration options, etc. Again, some people may perceive this as an issue, too many knobs to turn...

Another one is testing: next to unit/integration tests that are run automagically for every PR, rigorous regression testing is done for EasyBuild, and pull requests for new or modified easyconfig files require successful test reports for 99% of the time before they're considered for merging. I understand Spack is working towards something like that, but it's not quite there yet?

We should also mention aspects where both projects currently excel in: use of Travis, active community (mailing list, regular conf calls, ...), support for multiple compilers, etc.

Last but not least: the comparison should reflect current state, i.e. only consider what's in the last publicly available release, not what the intentions of the project are for the (near) future. The idea would then be to update the comparison on every new release.

adamjstewart commented 7 years ago

Your Git integration is very interesting. I fear that a lot of users shy away from contributing to Spack solely because they are unfamiliar with Git.

EasyBuild provides more flexibility in terms of controlling its behaviour than Spack does (to the best of my knowledge); for example support for custom module naming schemes (incl. a hierarchical module naming scheme)

I believe Spack has this now. @alalazo could confirm.

a plethora of configuration options

Yeah, Spack could really use this. It's definitely planned, but I haven't seen any movement yet.

Another one is testing: next to unit/integration tests that are run automagically for every PR, rigorous regression testing is done for EasyBuild, and pull requests for new or modified easyconfig files require successful test reports for 99% of the time before they're considered for merging. I understand Spack is working towards something like that, but it's not quite there yet?

Yep, we have unit testing and documentation testing, but no integration or regression testing for packages. This is definitely planned.

boegel commented 7 years ago

@adamjstewart I saw @alalazo's PR that got merged a while ago (#1723), but from what I can tell it provides nowhere near the flexibility that EasyBuild supports w.r.t. controlling how generated module files are named. It looks more of a proof-of-concept to me, but I may be wrong.

Another aspect that may deter newcomers from using Spack is that you have to know Python at least a little bit in order to make changes (or not be afraid to resort to copy-paste engineering).

That's less the case with EasyBuild, since you can just treat the easyconfig files as being key-value (even though the current format is Python syntax). If people want to make changes other than composing new or tweaking existing easyconfigs, then they need to know Python as well, of course.

But also here, EasyBuild provides a lot of flexibility, there's a lot you can get away with in easyconfig files, without having to implement even half a line of Python (see http://easybuild.readthedocs.io/en/latest/version-specific/easyconfig_parameters.html#vsd-avail-easyconfig-params).

citibeth commented 7 years ago

I don't think a grid showing system/features with a bunch of check boxes is useful here. Much more useful is text from the authors of the systems explaining what is good about their system.

I'm not thrilled with a big elaborate process of collaborating on and approving a comparison. Maybe @boegel can write one up, as suggested. Spackers who find omissions or errors can submit PRs to it. As long as things remain cordial, Spack can link to the comparison from its website.

I agree, integration testing is a big problem with Spack. It's combinatorial nature means you can't test everything Spack can build, you have to be smart about sampling the space. But no one has been smart yet. I finally went ahead and built my own integration test that builds the stuff I need. This is based on the experience that things keep getting broken if I don't test them repeatedly:

https://github.com/LLNL/spack/pull/2097

The test was more "successful" than I had imagined, and broken before it even built anything. YES, we need a more serious approach to integration testing. At least what I did is testing something that at least one user (me) wants.

tgamblin commented 7 years ago

@boegel:

One thing that is definitely missing in your list: 'flexibility' (although that may be too generic a term).

This is way too general. We at least need more specific categories for that, and I kind of agree with @citibeth that a feature matrix is possibly not the most useful comparison of these tools. I think the main differences are more fundamental than that.

Yes, EB has a bunch of configuration options, but Spack's core design is fundamentally more flexible than EB in that it has a much more powerful dependency model. This is one of the reasons I didn't just start using EasyBuild when I found it after I started working on Spack. The tools don't do the same thing at a very fundamental level. We're actually building something different here, and comparing EB checkbox-for-checkbox doesn't fully capture it.

I could easily say that EB is fundamentally inflexible in a lot of ways:

Easyconfigs depend on easyconfigs, with rigid dependency versions.
Tweaking a single version in an install tree in EB requires me to sed thousands of files and generate an entirely new tree of configs.
- In Spack I can install a package variant from the command line, or I can tweak a single package to get this behavior. The provenance is tracked when I do this.
In Spack I can depend on particular features of other packages (boost+python, boost+iostreams, etc.) whereas in EB I cannot.
Spack has optional dependencies; EB does not.
Spack has versioned virtual dependencies, and it can swap implementations of MPI, BLAS, etc. on the command line; EB cannot.
1. Installing with a new toolchain in EB requires making a new toolchain (which is a Python file) with dependencies I might not care about (FFTW), and then messing around with all the easyconfigs.
2. virtual dependencies & swappable compilers do this more cleanly (it's our intent to merge these two mechanisms in the future, so we just have virtual deps)
3. in spack if you want a new set of default build options, you edit one file, packages.yaml, set your default preferences for the stack you want, and go.
In EB, you have to install all your Python extensions up front, and if you want to add a new version of some Python package, you have to reinstall the whole thing. Same for R.
- Spack has extensions that can be activated/deactivated in a Python install.
- You can install new extensions as you need them.
EB doesn't track installed packages with a database, nor does it let you query your installed packages by attributes, compilers, etc.

Spack is a package manager and can manage combinatorial package complexity without a combinatorial number of config files. EB is an installer for a known stack of software, and if you want to build something someone hasn't built, you need to edit a bunch of files yourself because EB can't reason about dependencies. The spack dependency model is a superset of what EB provides.

Other key differences not related to dependencies:

Both Spack and EB are used by site admins to manage system installations, but Spack has more traction with application developers than I have seen with EB. EB seems like it is designed for system administrators, whereas Spack can fill both of these niches. App devs use Spack to vendor in their dependencies similarly to how people use Ruby gem, Javascript's npm and yarn, or Homebrew. I don't see where EB fits into that usage model, especially with the bootstrapping overhead.
Spack has Mac OS X support and people actually use it there, EB doesn't. This may have something to do with the previous issue.

In my mind, the gaps we have w.r.t. EB right now are:

Spack does not currently have regression testing for packages. It should, but right now we simply don't. The main obstacles to that are infrastructure and security, both of which we're working through. Eventually I would like to have Spack tested at LLNL, ALCF, OLCF, and NERSC.
EB has more configuration options. It's been around for 3 years longer, and it has more sites using it in production. So, yes, it has more extensive module generation support. I don't think this is a fundamental shortcoming of Spack and it's something we should probably expand on.

So I think this is in line with what I've said before: Spack is a newer project, and it's still in alpha, but we're hoping to have a 1.0 version with regression testing soon. And I think we're moving pretty fast. EB is a tried, tested, existing production project, and it's great for administering commodity clusters.

tgamblin commented 7 years ago

The strict dependency versions in EasyBuild are certainly a nuisance in some sense, but are actually perceived as an advantage as well since it facilitates reproducibility.

I am not sure how strong this argument really is. You can depend on version ranges in Spack but when it installs something the package spec is made concrete. Spack knows exactly what it is installing, and the full build spec is stored with every installation, along with the package files used to build it. So we have a complete specification of what it took to build every package. And we have that spec.yaml so that every point in the build space has a unique identifier.

We don't currently have a spack rebuild command, but all it would need to do is read in the spec.yaml from an install tree and fire off a build with the result. I suppose I could add that to clear up any misconceptions about reproducibility in Spack.

boegel commented 7 years ago

@tgamblin Not everything you've stated about EasyBuild is 100% correct, but I'm sure that goes for me talking about Spack too at times, I'm not up-to-date on all the recent developments in Spack, there's just too much going on. Let's keep getting the details right for later. ;)

I agree that the tools differ in fundamental ways, but in my mind this could be captured in a comparison matrix + accompanying sections linked from there to explain things in more detail. I haven't actually tried this though, so I'm not sure. It's definitely not an easy exercise. But that's exactly why we should do it! If we can highlight the main differences between both tools at a glance, with pointers to documentation that go more into detail, that would be a very valuable resource for a lot of people... If people need to read through 5 pages of text to get a feeling of how the tools differ, less people would actually read the whole thing. People will want a TL;DR version.

You don't have to convince me that Spack's way of dealing with dependencies is more powerful, there's no discussion there. But, as mentioned, it also leads to additional headaches (e.g. regression testing). So, depending on perspective and expectations, you could consider this a downside too...

Reproducibility isn't 100% either in EasyBuild, but with spack rebuild missing, EasyBuild is currently in the lead there imho. Once you do have spack rebuild, you may have more guarantees about reproducibility; EasyBuild has more moving parts that you could swap in/out that may break reproducibility. You would need to make it easy for people to exchange concrete specs though (which may be just grabbing a .yaml file, I'm not sure).

Support for different platforms is definitely another important key aspect. We're not really interesting in having good support for macOS, while the Spack community clearly is.

The focus of EasyBuild is indeed on HPC support teams that need to provide installations to their users, it's certainly less useful for application developers to manage their dependencies compared to Spack. I have no experience myself with using Spack for what EasyBuild is intended for, and from what I've heard, people also don't perceive Spack to be suited for this...

There are clear difference in terms of 'maturity', but Spack development is going fast, so that will change soon, indeed.

citibeth commented 7 years ago

Spack has versioned virtual dependencies, and it can swap implementations of MPI, BLAS, etc. on the command line; EB cannot.

Much more flexible than the toolchain concept.

In EB, you have to install all your Python extensions up front, and if you want to add a new version of some Python package, you have to reinstall the whole thing. Same for R.

Spack has extensions that can be activated/deactivated http://spack.readthedocs.io/en/latest/basic_usage.html#extensions-python-support in a Python install.

Or you just use it without activating.

Spack is a package manager and can manage combinatorial package complexity without a combinatorial number of config files. EB is an installer for a known stack of software, and if you want to build something someone hasn't built, you need to edit a bunch of files yourself because EB can't reason about dependencies. The spack dependency model is a superset of what EB provides.

I've always wondered... if I were doing it over again, whether I'd build a package manager that generates concretized recipes for an installer.

Both Spack and EB are used by site admins to manage system installations, but Spack has more traction with application developers than I have seen with EB. EB seems like it is designed for system administrators, whereas Spack can fill both of these niches. App devs use Spack to vendor in their dependencies similarly to how people use Ruby gem, Javascript's npm and yarn, or Homebrew. I don't see where EB fits into that usage model, especially with the bootstrapping overhead.

I began adding that functionality to EB, just as I did with Spack with spack setup. Not sure where it's been since then.

Spack has Mac OS X support and people actually use it there, EB doesn't. This may have something to do with the previous issue.

EB was beginning to get OS X support in early 2016.

Reproducibility isn't 100% either in EasyBuild, but with spack rebuild missing, EasyBuild is currently in the lead there imho.

I'm not sure what the issues are here with reproducability. If you use the same version of Spack and the same "install" command, you will get the same result. If we're talking about reproducibility issues, we need to be more clear about exactly what is meant.

alalazo commented 7 years ago

@boegel Just out of curiosity : what can you customize in naming and in hierarchy in EB ?

boegel commented 7 years ago

@alalazo well, basically anything (which makes it easy to shoot yourself in the foot, but fine ;))

EasyBuild supports to specify the location to a Python module via --include-module-naming-schemes that implements a class that derives (directly or indirectly) from the ModuleNamingScheme class provided by the EasyBuild framework. This class defines a couple of methods that determine all aspects of a module naming scheme.

We're (still) lacking proper documentation on this, but you can get a good feeling based on the methods supported in ModuleNamingScheme, see http://easybuild.readthedocs.io/en/latest/api/easybuild.tools.module_naming_scheme.mns.html .

You basically get full freedom to define the hierarchy the way you like it. The standard Core/Compiler/MPI hierarchy is implemented by https://github.com/hpcugent/easybuild-framework/blob/master/easybuild/tools/module_naming_scheme/hierarchical_mns.py .

alalazo commented 7 years ago

You basically get full freedom to define the hierarchy the way you like it

Ok, so you can generate out of the box a hierarchy like Core \ Compiler \ MPI \ Lapack where different version of lapack and mpi coexist ?

boegel commented 7 years ago

Yes, by deriving from the existing HierarchicalMNS and customising it. How difficult the implementation is mostly depends on how complex your hierarchy is, the EasyBuild framework gives you all the bits & pieces to puzzle it together.

alalazo commented 7 years ago

Ok, so I take it that you don't do it out of the box but you ask your users to extend part of the framework if they want to go beyond Core \ Compiler \ MPI right ?

boegel commented 7 years ago

Yes, we ship a couple of readily available module naming schemes, and people can define their own additionally. Using --include-module-naming-schemes they can put that Python module anywhere they like, EasyBuild will inject it into the right Python namespace at startup.

tgamblin commented 7 years ago

So, I think based on the comments above, Spack actually has more configurable modules support... given that users don't have to subclass a naming scheme and implement extra python. Support for TCL modules is documented here. There is a lot of stuff you can add to modules in a per-site modules.yaml and in the package.py files.

There is also support for extensive customization of Lmod naming schemes, but i don't think that is on the readthedocs site yet. It is in our slides for the SC16 tutorial, though. @alalazo has done a lot of work on the modules support. Maybe we should make the module configuration docs more prominent...

@alalazo: there is a paper on module support in EB, in case you haven't seen that.

citibeth commented 7 years ago

I would suggest that for the purposes of comparisons, a feature does not exist unless it's implemented and documented.

On Tue, Oct 25, 2016 at 3:40 PM, Massimiliano Culpo < notifications@github.com> wrote:

You basically get full freedom to define the hierarchy the way you like it

Ok, so you can generate out of the box a hierarchy like Core \ Compiler \ MPI \ Lapack where different version of lapack and mpi coexist ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LLNL/spack/issues/2115#issuecomment-256153483, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd4iKz1qUZ2L-Qf4Upk983Bi0iROfks5q3ls0gaJpZM4Kf70s .

boegel commented 7 years ago

@tgamblin from what I can tell, Spack does indeed provide more flexibility than EasyBuild does today w.r.t. controlling which 'define' statements go into a module file, and specifying the naming scheme is easier.

However... From what I can tell, you can not generate 'true' hierarchy (in a flexible way) in the Lmod sense, since that involves more than just the module names, you need to include additional statements or slightly tweak existing ones (and sometimes exclude others) for Lmod to work correctly with the hierarchy; mostly use statements.

Also, what about enforcing all lowercase or uppercase module names (or capitalising each part of the module name... or reversing it ;-))? And what if I want to replace the name with a different label in the generated module name (e.g. hanythingondemand -> hod), etc.?

It seems like that type of flexibility is not (yet?) provided in the Spack config?

The downside of the EasyBuild mechanism is indeed that you have to implement the naming scheme in code. The upside is that's it's implemented in code: you can literally puzzle together the module name however you please...

This is starting to become a "see what I can do!" discussion, so I suggest we'll move forward as follows: I'll try and put together a comparison we can build on, focusing on three main aspects of both tools to start with.

I'd say "dependency management", "generated module files" are two of them. What else? Installation & configuration of EasyBuild/Spack itself?

I guess it's a good thing both tools are implemented in Python2, so we can avoid a holy war on implementation language for now. ;-)

boegel commented 7 years ago

@citibeth agreed! (which means more incentive for me to finally document the custom module naming scheme support)

citibeth commented 7 years ago

Note that the Spack team is scrambling for a numbered release by SC16. Maybe this is not the best time to continue this conversation in-depth. I'm marking it "revisit."

alalazo commented 7 years ago

However... From what I can tell, you can not generate 'true' hierarchy (in a flexible way) in the Lmod sense, since that involves more than just the module names, you need to include additional statements or slightly tweak existing ones (and sometimes exclude others) for Lmod to work correctly with the hierarchy; mostly use statements.

Hey thanks all the exchange above was enlightening!

So, just for the record here's how it works with a proof of concept:

# modules.yaml : all the user needs to write
modules:
  enable:
    - lmod
  lmod:
    whitelist: ['gcc']
    blacklist: ['%gcc@4.8']
    core_compilers: ['gcc@4.8']
    hierarchical_scheme: ['lapack']

then:

$ spack compilers
==> Available compilers
-- gcc ----------------------------------------------------------
gcc@6.2.0  gcc@4.8

$ spack find  -dl netlib-scalapack
==> 4 installed packages.
-- linux-Ubuntu14-x86_64 / gcc@6.2.0 ----------------------------
wnimqhw    netlib-scalapack@2.0.2
5n5xoep        ^mpich@3.2
mirer2l        ^netlib-lapack@3.6.1

6bqlxqy    netlib-scalapack@2.0.2
5n5xoep        ^mpich@3.2
js33umc        ^openblas@0.2.19

wojunhq    netlib-scalapack@2.0.2
mirer2l        ^netlib-lapack@3.6.1
s3qbtby        ^openmpi@2.0.1
3ostwel            ^hwloc@1.11.4
eo2siet                ^libpciaccess@0.13.4

hpqb3dp    netlib-scalapack@2.0.2
js33umc        ^openblas@0.2.19
s3qbtby        ^openmpi@2.0.1
3ostwel            ^hwloc@1.11.4
eo2siet                ^libpciaccess@0.13.4

The generated hierarchy:

share/spack/lmod/linux-Ubuntu14-x86_64/
├── Core
│   └── gcc
│       └── 6.2.0-fw44bd.lua
├── gcc
│   └── 6.2.0
│       ├── bzip2
│       │   └── 1.0.6-csoc2m.lua
│       ├── cmake
│       │   └── 3.5.2-6poypq.lua
...
├── netlib-lapack
│   └── 3.6.1-mirer2
│       ├── mpich
│       │   └── 3.2-5n5xoe
│       │       └── gcc
│       │           └── 6.2.0
│       │               └── netlib-scalapack
│       │                   └── 2.0.2-wnimqh.lua
│       └── openmpi
│           └── 2.0.1-s3qbtb
│               └── gcc
│                   └── 6.2.0
│                       └── netlib-scalapack
│                           └── 2.0.2-wojunh.lua
└── openblas
    └── 0.2.19-js33um
        ├── mpich
        │   └── 3.2-5n5xoe
        │       └── gcc
        │           └── 6.2.0
        │               └── netlib-scalapack
        │                   └── 2.0.2-6bqlxq.lua
        └── openmpi
            └── 2.0.1-s3qbtb
                └── gcc
                    └── 6.2.0
                        └── netlib-scalapack
                            └── 2.0.2-hpqb3d.lua

And yes, then you can do things like:

$ module list

Currently Loaded Modules:
  1) gcc/6.2.0-fw44bd   2) openmpi/2.0.1-s3qbtb   3) openblas/0.2.19-js33um   4) netlib-scalapack/2.0.2-hpqb3d

$ module load netlib-lapack

Lmod is automatically replacing "openblas/0.2.19-js33um" with "netlib-lapack/3.6.1-mirer2"

The following have been reloaded with a version change:
  1) netlib-scalapack/2.0.2-hpqb3d => netlib-scalapack/2.0.2-wojunh

$ module load mpich

Lmod is automatically replacing "openmpi/2.0.1-s3qbtb" with "mpich/3.2-5n5xoe"

The following have been reloaded with a version change:
  1) netlib-scalapack/2.0.2-wojunh => netlib-scalapack/2.0.2-wnimqh

citibeth commented 7 years ago

I came across the folloing on the CMake email list today. My takeaways are:

Maybe we should increase the priority of this kind of comparison project.
Maybe we should review http://spack.io with respect to this statement:

None of the projects have good marketing: it appears they somehow solve similar problems, but none actually have defined the problem or their solution.

The author gives more details that we can/should include up-front:

about if they handle cross compiling (not a common use case but it is yours and mine), what packages they create, what compromises they make, what they expect of my environment...

I have found the following projects which all seem to do some variation of a meta build so that you can build multiple projects that depend on each other and manage dependencies. (there are a couple others that seem to not be maintained as well)

https://gradle.org/ https://bazel.build/ https://github.com/LLNL/spack https://github.com/ruslo/hunter http://www.biicode.com https://conan.io/ https://conda.io/

Unfortunately I have never found anyone who has actually compared even two of these. None of the projects have good marketing: it appears they somehow solve similar problems, but none actually have defined the problem or their solution. It is like everyone assumes that everyone in the world has their exact same problem and the solution is obvious so the only thing left is the details of implementing it. This of course tells me nothing about if they handle cross compiling (not a common use case but it is yours and mine), what packages they create, what compromises they make, what they expect of my environment... These are important questions: I'm pretty sure that I could eliminate several just by comparing my needs to their features.

I'm currently using an in house system that builds everything in a Docker which lets me ensure nobody is accidentally using the wrong compiler. (we cross compile for a x86 target - 90% of the time if you build with gcc for the local system everything will work just fine, the other 10% of the time our system has an imcompatible version of some library and things blow up when you try to use some uncommon feature). I'm thinking about moving to one of the above, but I haven't actually evaluated anything.

If you do evaluate any of the above please document your experience and in particular what is good/bad about the things you look at.

tgamblin commented 7 years ago

Do you have a link to the OP?

tgamblin commented 7 years ago

On spack.io: a comparison page would be nice. I don't think it goes on the front page, as if pure not familiar with package management at all, a comparison is meaningless.

alalazo commented 6 years ago

Issue solved by @boegel at FOSDEM

ChrisDowning commented 5 years ago

Dumping this here in lieu of a better place - following a Twitter thread (cc @tgamblin @boegel )

First, some context: I frequently deal with HPC groups in the UK of various sizes, including occasionally working on cluster deployments (using OpenHPC and commercial alternatives). So far, there has not been an instance where a customer has had a specific interest in either Spack or EasyBuild, but I have experimented with both (and plan to do so in more detail in the future. Basically, this should be read as the view from someone sitting somewhere between "enthusiastic amateur" and "mid-level sysadmin", depending on your frame of reference. Unfortunately I haven't had as much time to spend on hands-on things as I'd like but that is hopefully changing in the next month or so; hence any testing of new tools has had to be somewhat selective until now.

To be blunt, my initial experimentation with both Spack and EasyBuild left a lot to be desired. I did not, admittedly, put much time into it, but using the "quick start" examples for both tools in a fresh VM led to a lot of waiting followed by error messages, and not much else. Without a "hook" to proceed from, I drifted away to doing something else quite quickly, but returned for another attempt a few weeks later. Again, both packages presented some issues which did not inspire much confidence, but I persevered a bit more on this occasion and eventually got a bunch of packages installed using each tool. In neither case, though, did I get all the way to the desired "final" package I was aiming at (WRF, I believe). I am happy to believe this is a result of user error more than anything else given that both tools have a decent uptake and I only gave a couple of hours of my attention to it - but I can't help wonder how many other people have run into the same walls and decided not to bother continuing.

I should point out here that I do 100% see and understand the benefits of these tools, but have some reservations about recommending their usage to a lot of the people I deal with - as mentioned in the Twitter thread, there is probably quite a gulf in experience between the customers I deal with for hands-on tasks and the communities the tools originated in, particularly in the case of Spack. For a very complex package with lots of dependencies, having a method of automating the build process is very welcome. Even for simple packages, I can see the appeal of avoiding duplication of effort. That said, someone somewhere still needs to do the work of defining a package configuration for both Spack and EasyBuild. I (and hopefully a lot of other people) am uncomfortable with delagating my thinking any more than strictly necessary, and also with adding some external party as a dependency to my work. Inevitably, I will want to understand how the packages I need are put together, at least to the level where I can adapt the template myself.

My experience of HPC support staff (where there are any - which might really be the bigger problem eating at me here...) fall into two distinct camps:

People like me, who want to understand how all of the bits of the system fit together, and so would only be comfortable if they could have confidence that they could regenerate the templates/inputs themselves. For this group, Spack and EasyBuild definitely have a place, but I wonder how much effort will really be saved without sacrificing some of that understanding and control.
- People who have absolutely no idea what they are doing - generally because they are inexperienced users who have had admin duties thrust upon them, or because they are Windows/enterprise IT people who didn't really grasp what they were volunteering for. For this group, it feels like the package managers are a double-edged sword - someone who is not a confident Linux user would be able to make some use of either tool, but any solution they get working would be effectively a black box. While this would be almost as true for a "just run this bash script" solution, there is at least the possibility that colleagues or external parties with Linux knowledge would be able to advise on fixing/adjusting a standard build script, while Spack/EasyBuild would probably be too complex for someone coming in cold.

Broadly, I concur with the points raised in the comment here: https://ask.cyberinfrastructure.org/t/easybuild-vs-spack-anyone-have-opinions-of-which-is-better-for-bioinformatics/612

I particularly like the final point in that comment about finding, but not necessarily using, a Dockerfile. Container definition files expose the full story to their user, and when written appropriately (that is, not pulling in random zip files from obscure cloud object storage locations as is depressingly common) can easily be used to reverse-engineer a standard bare-metal build script. An argument against container builds which has cropped up is the fact that a portable container is naturally not an optimised one. I totally agree with that, which is why I would generally prefer to work with container definition files rather than the container blobs themselves (the exception being in the cloud - where I would want to define in advance which VM types I'll be running on and build for them, then just copy the container around rather than rebuilding on each instance). The contents of the definition file can then be adapted to include optimisation where available. The broader benefits of containers are sort of coincidental beyond this point, and as other have pointed out both Spack and EasyBuild can be used within container images, so I don't think the topic of containerisation warrants any more detailed discussion here.

Sticking with the topic of optimisation though, I have to wonder if a combinatorial array of package builds (which both package managers seem to describe as a selling point) is really all that beneficial for the end-user; surely only one of those is the optimum for a given workload, and so most of the potential builds are just noise? Being able to easily build a full array of packages and test them would be a nice intellectual exercise, but it seems far more likely to me that people will simply research what build others have determined is the "best", and use that one. After all, the underlying underlying hardware does not actually change particularly quickly - even in a cloud environment.

That point leads me to a general sentiment I get from both tools, which may or may not be fair - namely, both feel like software solutions to a human problem. We are terrible at documenting the things we have done, and do not do enough to share our knowledge in a useful way. Build templates for each tool notionally solve both of these problems, but I'm not convinced the complexity introduced is a good trade-off for the advantages provided.

To conclude, I should probably point out that a good portion of this is devils-advocate-ing; I will definitely be using Spack and/or EasyBuild in the future - life is too short to mess around building packages if I don't need to. It is easy to read the above as a critique of both packages, but what I am really getting at here is: what am I missing?

In particular:

Which tool should I (or the people I work with) use in a given situation? I have read this (https://github.com/spack/spack/issues/2115) thread and this presentation - but do not have a clear answer.
What is the simplest package which the developers believe warrants writing a Spack/EasyBuild template versus a script, assuming that is the only package users need? This feels like a good way to demonstrate the effort required.

Ultimately, I don't believe there is an optimal solution to HPC package management waiting to be discovered, and so having additional tools can only be a good thing overall. Before embracing either Spack or EasyBuild as a solution though, I would like to better grasp what is being sacrificed - so this can be balanced against the obvious benefits.

citibeth commented 5 years ago

I tried EasyBuild before Spack; and for me at that time, it was a night-and-day difference. EB needed a new recipe for each version of a package, whereas Spack is based around this amazing concretizer. I haven't looked at EB for 3 years, so maybe it's different at this time. But at the time I evaluated, I felt Spack was hands-down the better system.

What is the simplest package which the developers believe warrants writing a Spack/EasyBuild template versus a script, assuming that is the only package users need? This feels like a good way to demonstrate the effort required.

They say that if all you have is a hammer, then everything looks like a nail. My approach in general is to get a hammer, get really good at using it, and then use it to pound any nail if at all possible. I have invested in that hammer called Spack; and therefore, it is pretty much the ONLY way I install software. Unless it is REALLY unsuited for the job (a few MacPorts things, for example; a hammer I've also invested a bit of time into).

Spack provides tremendous efficiency of scale. I can build (and re-build) my system of 100 packages in half a day. That would never be possible by hand. So it changes my view on software building. Traditionally we think of building as a BIG JOB. Now, I think of the concretization --- assembly of different versions --- as the biggest part of the task of assembling software for a project. Actually building it consists of just letting Spack do its thing, and handling the ~5% of packages that have a problem for some reason.

Spack is a tool, it's not magic. You still have to know how to build by hand, because then you'll be able to decipher the log files of the 5% of things that need help. But 95% of the packages "just work" in the build, and fade out of my consciousness.

When something doesn't work and I put effort into figuring out the workaround --- well, I document that. In the old days, we documented it in English; and if I needed to rebuild the same thing again, I would re-read my notes and follow the instructions. I feel that Spack is an automated version of that process. Build instructions are written in Python not English; and they can be executed quickly by a machine, instead of slowly by me. If I really want/need to know what's going on, I can always read the build instructions (eg Spack recipe). Best of all... if someone else finds and fixes a problem, then I might never even know the problem existed. The workaround has already been incorporated in the recipe and it "just works" for me. In this way, I view Spack as a "software building collective" in which we all contribute our little bits of build problem solving while benefitting from everyone else's. On the downside, sometimes others break recipes. But they fix recipes far more often than they break them.

With this change in perspective, pre-building a zillion combinatorial combinations of "base" software makes no sense. Instead, a Spack-based shop would define a set of Spack Environments --- collections of software --- on a per-user or per-group basis. And then keep each environment updated to the needs of that person/group. When someone needs a new package, it gets added to their environment, etc. Each group can get a personalized set of builds in a way that was never possible before. Getting the full benefit from Spack requires this paradigm shift, IMHO.

Another part of this shift is to think about how users will develop their own software / scripts / projects. The most elegant approach is to avoid any differentiation between "3d party" and "inside" software; or between "base" and "user" software; and to just make a Spack recipe for every project built in the organization. Just like you write a CMake file, you also write a Spack package. In my experience, writing a Spack package for my new project isn't any more work than what I'd have to do anyway: assemble an environment or something with the dependencies for my project. But it IS a lot more elegant, and more reusable in the future. In this way, we can finally assemble (and create) stacks of software as deep as we like, with as many dependencies as we like, and Spack takes care of it all.

Another change in paradigm is, what do you do with new infrastructure? Rebuilding all your packages because you got a new compiler (or need a new compiler) isn't so bad. In fact, building a new GCC that you need, instead of using the obsolete GCC that came on your system, is also not so bad. Case in point: I recently took my entire stack (30 packages) that worked with GCC and got it built with Intel, which was needed on our supercomputer. It took me a few days, mostly learning how to set up the Intel compiler properly and getting around a few minor glitches. Now I have my stack working on GCC and Intel.

Before embracing either Spack or EasyBuild as a solution though, I would like to better grasp what is being sacrificed - so this can be balanced against the obvious benefits.

Sacrificed? Like any tool, Spack has a learning curve. But it pays off quickly. Because even modest builds (say, 20 packages) can easily take someone 2 weeks to install manually (I've seen it and I've done it too). If you put 9 days into learning to use Spack and debugging any problems for your builds / needs, then on day 10 you'll be able to build your entire environment with little or no effort. And it will be repeatable.

I guess the only other possible sacrifice with Spack is, it works best if you incorporate "Spack think" throughout your workflow. Use Spack Environments. Build Spack packages for the stuff you're developing. Use environment modules. Fork Spack and put your own customizations on that fork (while trying to get them merged back into develop). When it comes time to use your package, either you'll be loading a lot of modules (say, 100): and searching through those long paths can be slow on some systems. Or you'll be setting up a Spack View that incorporates all of them, which doubles the number of inodes you need; some systems have inode limits.

I can't think of any other sacrifices. Spack is great. I've put significant effort in this hammer, and now it's REALLY good at pounding nails for me --- some of them are nails I never realized needed pounding, because pounding nails without Spack was too hard, slow and tedious.

People who have absolutely no idea what they are doing

I've concluded that, with or without Spack, these people will never be very good at assembling large builds of open source software. They need to stick to clicking on install Wizards on Windows, filling in license keys, and calling tech support for help.