Closed citibeth closed 6 years ago
I would suggest the following points (with a [S] or [EB] indicating which choice I prefer from a tech point of view):
spack module loads --dependencies
command generates module load
scripts with similar effect. [EB: I think that Spack will need to become more tightly integrated over time]spack activate
(which breaks Spack in a core way, i.e. no more combinatorial versioning). I prefer to generate load scripts for the Python modules I need, and avoid Spack activate.spack setup
--- to configure a CMake build based on Spack's recipes [S]Should definitely also compare with Conda and PIP. Some interesting things here:
@citibeth that looks like a good start!
I'd like to see this fleshed out in table form ideally, something like:
aspect | EasyBuild | Spack |
---|---|---|
RPATH support | (WIP) | supported |
...
Each aspect should then link to a subsection of the same page which provides more details. That would enable to get a good idea at first glance.
It's not entirely clear to me where this should be hosted, but a separate GitHub repository ("neutral ground") makes sense to me, where we could apply updates via PRs that have to be approved by the other 'party' before the live comparison is updated. Another option is to include the comparison in the EasyBuild/Spack documentation, but then we have two places to keep in sync, which would be a PITA.
Maybe we should focus on a handful (3? 5?) of aspects first to figure out the formatting? And then take it from there....
I would avoid marking a winner on each aspect, since in some cases it may not be very clear which project does the better job, I feel it sometimes matter on perspective for certain things. The strict dependency versions in EasyBuild are certainly a nuisance in some sense, but are actually perceived as an advantage as well since it facilitates reproducibility.
I disagree on your statement that Spack is easier to contribute to once installed. I understand your point that having everything in a single repository simplifies things, that's certainly the case.
However, EasyBuild provides very nice integration with GitHub (http://easybuild.readthedocs.io/en/latest/Integration_with_GitHub.html), and people can contribute new easyconfigs and upload test reports for them without knowing anything about git
(which is a significant hurdle for some people).
You can hardly claim than running a single eb
command to contribute something is harder than running 4-5 git
commands following by using the GitHub interface (sure, if you know where to go, it's trivial, but lots of people are (still) new to it)...
One thing that is definitely missing in your list: 'flexibility' (although that may be too generic a term). EasyBuild provides more flexibility in terms of controlling its behaviour than Spack does (to the best of my knowledge); for example support for custom module naming schemes (incl. a hierarchical module naming scheme), a plethora of configuration options, etc. Again, some people may perceive this as an issue, too many knobs to turn...
Another one is testing: next to unit/integration tests that are run automagically for every PR, rigorous regression testing is done for EasyBuild, and pull requests for new or modified easyconfig files require successful test reports for 99% of the time before they're considered for merging. I understand Spack is working towards something like that, but it's not quite there yet?
We should also mention aspects where both projects currently excel in: use of Travis, active community (mailing list, regular conf calls, ...), support for multiple compilers, etc.
Last but not least: the comparison should reflect current state, i.e. only consider what's in the last publicly available release, not what the intentions of the project are for the (near) future. The idea would then be to update the comparison on every new release.
Your Git integration is very interesting. I fear that a lot of users shy away from contributing to Spack solely because they are unfamiliar with Git.
EasyBuild provides more flexibility in terms of controlling its behaviour than Spack does (to the best of my knowledge); for example support for custom module naming schemes (incl. a hierarchical module naming scheme)
I believe Spack has this now. @alalazo could confirm.
a plethora of configuration options
Yeah, Spack could really use this. It's definitely planned, but I haven't seen any movement yet.
Another one is testing: next to unit/integration tests that are run automagically for every PR, rigorous regression testing is done for EasyBuild, and pull requests for new or modified easyconfig files require successful test reports for 99% of the time before they're considered for merging. I understand Spack is working towards something like that, but it's not quite there yet?
Yep, we have unit testing and documentation testing, but no integration or regression testing for packages. This is definitely planned.
@adamjstewart I saw @alalazo's PR that got merged a while ago (#1723), but from what I can tell it provides nowhere near the flexibility that EasyBuild supports w.r.t. controlling how generated module files are named. It looks more of a proof-of-concept to me, but I may be wrong.
Another aspect that may deter newcomers from using Spack is that you have to know Python at least a little bit in order to make changes (or not be afraid to resort to copy-paste engineering).
That's less the case with EasyBuild, since you can just treat the easyconfig files as being key-value (even though the current format is Python syntax). If people want to make changes other than composing new or tweaking existing easyconfigs, then they need to know Python as well, of course.
But also here, EasyBuild provides a lot of flexibility, there's a lot you can get away with in easyconfig files, without having to implement even half a line of Python (see http://easybuild.readthedocs.io/en/latest/version-specific/easyconfig_parameters.html#vsd-avail-easyconfig-params).
I don't think a grid showing system/features with a bunch of check boxes is useful here. Much more useful is text from the authors of the systems explaining what is good about their system.
I'm not thrilled with a big elaborate process of collaborating on and approving a comparison. Maybe @boegel can write one up, as suggested. Spackers who find omissions or errors can submit PRs to it. As long as things remain cordial, Spack can link to the comparison from its website.
I agree, integration testing is a big problem with Spack. It's combinatorial nature means you can't test everything Spack can build, you have to be smart about sampling the space. But no one has been smart yet. I finally went ahead and built my own integration test that builds the stuff I need. This is based on the experience that things keep getting broken if I don't test them repeatedly:
https://github.com/LLNL/spack/pull/2097
The test was more "successful" than I had imagined, and broken before it even built anything. YES, we need a more serious approach to integration testing. At least what I did is testing something that at least one user (me) wants.
@boegel:
One thing that is definitely missing in your list: 'flexibility' (although that may be too generic a term).
This is way too general. We at least need more specific categories for that, and I kind of agree with @citibeth that a feature matrix is possibly not the most useful comparison of these tools. I think the main differences are more fundamental than that.
Yes, EB has a bunch of configuration options, but Spack's core design is fundamentally more flexible than EB in that it has a much more powerful dependency model. This is one of the reasons I didn't just start using EasyBuild when I found it after I started working on Spack. The tools don't do the same thing at a very fundamental level. We're actually building something different here, and comparing EB checkbox-for-checkbox doesn't fully capture it.
I could easily say that EB is fundamentally inflexible in a lot of ways:
sed
thousands of files and generate an entirely new tree of configs.
boost+python
, boost+iostreams
, etc.) whereas in EB I cannot.packages.yaml
, set your default preferences for the stack you want, and go.Spack is a package manager and can manage combinatorial package complexity without a combinatorial number of config files. EB is an installer for a known stack of software, and if you want to build something someone hasn't built, you need to edit a bunch of files yourself because EB can't reason about dependencies. The spack dependency model is a superset of what EB provides.
Other key differences not related to dependencies:
gem
, Javascript's npm
and yarn
, or Homebrew
. I don't see where EB fits into that usage model, especially with the bootstrapping overhead.In my mind, the gaps we have w.r.t. EB right now are:
So I think this is in line with what I've said before: Spack is a newer project, and it's still in alpha, but we're hoping to have a 1.0 version with regression testing soon. And I think we're moving pretty fast. EB is a tried, tested, existing production project, and it's great for administering commodity clusters.
The strict dependency versions in EasyBuild are certainly a nuisance in some sense, but are actually perceived as an advantage as well since it facilitates reproducibility.
I am not sure how strong this argument really is. You can depend on version ranges in Spack but when it installs something the package spec is made concrete. Spack knows exactly what it is installing, and the full build spec is stored with every installation, along with the package files used to build it. So we have a complete specification of what it took to build every package. And we have that spec.yaml
so that every point in the build space has a unique identifier.
We don't currently have a spack rebuild
command, but all it would need to do is read in the spec.yaml
from an install tree and fire off a build with the result. I suppose I could add that to clear up any misconceptions about reproducibility in Spack.
@tgamblin Not everything you've stated about EasyBuild is 100% correct, but I'm sure that goes for me talking about Spack too at times, I'm not up-to-date on all the recent developments in Spack, there's just too much going on. Let's keep getting the details right for later. ;)
I agree that the tools differ in fundamental ways, but in my mind this could be captured in a comparison matrix + accompanying sections linked from there to explain things in more detail.
I haven't actually tried this though, so I'm not sure. It's definitely not an easy exercise. But that's exactly why we should do it!
If we can highlight the main differences between both tools at a glance, with pointers to documentation that go more into detail, that would be a very valuable resource for a lot of people...
If people need to read through 5 pages of text to get a feeling of how the tools differ, less people would actually read the whole thing. People will want a TL;DR
version.
You don't have to convince me that Spack's way of dealing with dependencies is more powerful, there's no discussion there. But, as mentioned, it also leads to additional headaches (e.g. regression testing). So, depending on perspective and expectations, you could consider this a downside too...
Reproducibility isn't 100% either in EasyBuild, but with spack rebuild
missing, EasyBuild is currently in the lead there imho. Once you do have spack rebuild
, you may have more guarantees about reproducibility; EasyBuild has more moving parts that you could swap in/out that may break reproducibility. You would need to make it easy for people to exchange concrete specs though (which may be just grabbing a .yaml
file, I'm not sure).
Support for different platforms is definitely another important key aspect. We're not really interesting in having good support for macOS, while the Spack community clearly is.
The focus of EasyBuild is indeed on HPC support teams that need to provide installations to their users, it's certainly less useful for application developers to manage their dependencies compared to Spack. I have no experience myself with using Spack for what EasyBuild is intended for, and from what I've heard, people also don't perceive Spack to be suited for this...
There are clear difference in terms of 'maturity', but Spack development is going fast, so that will change soon, indeed.
- Spack has versioned virtual dependencies, and it can swap implementations of MPI, BLAS, etc. on the command line; EB cannot.
Much more flexible than the toolchain concept.
- In EB, you have to install all your Python extensions up front, and if you want to add a new version of some Python package, you have to reinstall the whole thing. Same for R.
- Spack has extensions that can be activated/deactivated http://spack.readthedocs.io/en/latest/basic_usage.html#extensions-python-support in a Python install.
Or you just use it without activating.
Spack is a package manager and can manage combinatorial package complexity without a combinatorial number of config files. EB is an installer for a known stack of software, and if you want to build something someone hasn't built, you need to edit a bunch of files yourself because EB can't reason about dependencies. The spack dependency model is a superset of what EB provides.
I've always wondered... if I were doing it over again, whether I'd build a package manager that generates concretized recipes for an installer.
- Both Spack and EB are used by site admins to manage system installations, but Spack has more traction with application developers than I have seen with EB. EB seems like it is designed for system administrators, whereas Spack can fill both of these niches. App devs use Spack to vendor in their dependencies similarly to how people use Ruby gem, Javascript's npm and yarn, or Homebrew. I don't see where EB fits into that usage model, especially with the bootstrapping overhead.
I began adding that functionality to EB, just as I did with Spack with
spack setup
. Not sure where it's been since then.
Spack has Mac OS X support and people actually use it there, EB doesn't. This may have something to do with the previous issue.
EB was beginning to get OS X support in early 2016.
Reproducibility isn't 100% either in EasyBuild, but with
spack rebuild
missing, EasyBuild is currently in the lead there imho.
I'm not sure what the issues are here with reproducability. If you use the same version of Spack and the same "install" command, you will get the same result. If we're talking about reproducibility issues, we need to be more clear about exactly what is meant.
@boegel Just out of curiosity : what can you customize in naming and in hierarchy in EB ?
@alalazo well, basically anything (which makes it easy to shoot yourself in the foot, but fine ;))
EasyBuild supports to specify the location to a Python module via --include-module-naming-schemes
that implements a class that derives (directly or indirectly) from the ModuleNamingScheme
class provided by the EasyBuild framework. This class defines a couple of methods that determine all aspects of a module naming scheme.
We're (still) lacking proper documentation on this, but you can get a good feeling based on the methods supported in ModuleNamingScheme
, see http://easybuild.readthedocs.io/en/latest/api/easybuild.tools.module_naming_scheme.mns.html .
You basically get full freedom to define the hierarchy the way you like it. The standard Core
/Compiler
/MPI
hierarchy is implemented by https://github.com/hpcugent/easybuild-framework/blob/master/easybuild/tools/module_naming_scheme/hierarchical_mns.py .
You basically get full freedom to define the hierarchy the way you like it
Ok, so you can generate out of the box a hierarchy like Core
\ Compiler
\ MPI
\ Lapack
where different version of lapack and mpi coexist ?
Yes, by deriving from the existing HierarchicalMNS
and customising it. How difficult the implementation is mostly depends on how complex your hierarchy is, the EasyBuild framework gives you all the bits & pieces to puzzle it together.
Ok, so I take it that you don't do it out of the box but you ask your users to extend part of the framework if they want to go beyond Core
\ Compiler
\ MPI
right ?
Yes, we ship a couple of readily available module naming schemes, and people can define their own additionally. Using --include-module-naming-schemes
they can put that Python module anywhere they like, EasyBuild will inject it into the right Python namespace at startup.
So, I think based on the comments above, Spack actually has more configurable modules support... given that users don't have to subclass a naming scheme and implement extra python. Support for TCL modules is documented here. There is a lot of stuff you can add to modules in a per-site modules.yaml
and in the package.py
files.
There is also support for extensive customization of Lmod naming schemes, but i don't think that is on the readthedocs
site yet. It is in our slides for the SC16 tutorial, though. @alalazo has done a lot of work on the modules support. Maybe we should make the module configuration docs more prominent...
@alalazo: there is a paper on module support in EB, in case you haven't seen that.
I would suggest that for the purposes of comparisons, a feature does not exist unless it's implemented and documented.
On Tue, Oct 25, 2016 at 3:40 PM, Massimiliano Culpo < notifications@github.com> wrote:
You basically get full freedom to define the hierarchy the way you like it
Ok, so you can generate out of the box a hierarchy like Core \ Compiler \ MPI \ Lapack where different version of lapack and mpi coexist ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LLNL/spack/issues/2115#issuecomment-256153483, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd4iKz1qUZ2L-Qf4Upk983Bi0iROfks5q3ls0gaJpZM4Kf70s .
@tgamblin from what I can tell, Spack does indeed provide more flexibility than EasyBuild does today w.r.t. controlling which 'define' statements go into a module file, and specifying the naming scheme is easier.
However... From what I can tell, you can not generate 'true' hierarchy (in a flexible way) in the Lmod sense, since that involves more than just the module names, you need to include additional statements or slightly tweak existing ones (and sometimes exclude others) for Lmod to work correctly with the hierarchy; mostly use
statements.
Also, what about enforcing all lowercase or uppercase module names (or capitalising each part of the module name... or reversing it ;-))? And what if I want to replace the name with a different label in the generated module name (e.g. hanythingondemand
-> hod
), etc.?
It seems like that type of flexibility is not (yet?) provided in the Spack config?
The downside of the EasyBuild mechanism is indeed that you have to implement the naming scheme in code. The upside is that's it's implemented in code: you can literally puzzle together the module name however you please...
This is starting to become a "see what I can do!" discussion, so I suggest we'll move forward as follows: I'll try and put together a comparison we can build on, focusing on three main aspects of both tools to start with.
I'd say "dependency management", "generated module files" are two of them. What else? Installation & configuration of EasyBuild/Spack itself?
I guess it's a good thing both tools are implemented in Python2, so we can avoid a holy war on implementation language for now. ;-)
@citibeth agreed! (which means more incentive for me to finally document the custom module naming scheme support)
Note that the Spack team is scrambling for a numbered release by SC16. Maybe this is not the best time to continue this conversation in-depth. I'm marking it "revisit."
However... From what I can tell, you can not generate 'true' hierarchy (in a flexible way) in the Lmod sense, since that involves more than just the module names, you need to include additional statements or slightly tweak existing ones (and sometimes exclude others) for Lmod to work correctly with the hierarchy; mostly use statements.
Hey thanks all the exchange above was enlightening!
So, just for the record here's how it works with a proof of concept:
# modules.yaml : all the user needs to write
modules:
enable:
- lmod
lmod:
whitelist: ['gcc']
blacklist: ['%gcc@4.8']
core_compilers: ['gcc@4.8']
hierarchical_scheme: ['lapack']
then:
$ spack compilers
==> Available compilers
-- gcc ----------------------------------------------------------
gcc@6.2.0 gcc@4.8
$ spack find -dl netlib-scalapack
==> 4 installed packages.
-- linux-Ubuntu14-x86_64 / gcc@6.2.0 ----------------------------
wnimqhw netlib-scalapack@2.0.2
5n5xoep ^mpich@3.2
mirer2l ^netlib-lapack@3.6.1
6bqlxqy netlib-scalapack@2.0.2
5n5xoep ^mpich@3.2
js33umc ^openblas@0.2.19
wojunhq netlib-scalapack@2.0.2
mirer2l ^netlib-lapack@3.6.1
s3qbtby ^openmpi@2.0.1
3ostwel ^hwloc@1.11.4
eo2siet ^libpciaccess@0.13.4
hpqb3dp netlib-scalapack@2.0.2
js33umc ^openblas@0.2.19
s3qbtby ^openmpi@2.0.1
3ostwel ^hwloc@1.11.4
eo2siet ^libpciaccess@0.13.4
The generated hierarchy:
share/spack/lmod/linux-Ubuntu14-x86_64/
├── Core
│ └── gcc
│ └── 6.2.0-fw44bd.lua
├── gcc
│ └── 6.2.0
│ ├── bzip2
│ │ └── 1.0.6-csoc2m.lua
│ ├── cmake
│ │ └── 3.5.2-6poypq.lua
...
├── netlib-lapack
│ └── 3.6.1-mirer2
│ ├── mpich
│ │ └── 3.2-5n5xoe
│ │ └── gcc
│ │ └── 6.2.0
│ │ └── netlib-scalapack
│ │ └── 2.0.2-wnimqh.lua
│ └── openmpi
│ └── 2.0.1-s3qbtb
│ └── gcc
│ └── 6.2.0
│ └── netlib-scalapack
│ └── 2.0.2-wojunh.lua
└── openblas
└── 0.2.19-js33um
├── mpich
│ └── 3.2-5n5xoe
│ └── gcc
│ └── 6.2.0
│ └── netlib-scalapack
│ └── 2.0.2-6bqlxq.lua
└── openmpi
└── 2.0.1-s3qbtb
└── gcc
└── 6.2.0
└── netlib-scalapack
└── 2.0.2-hpqb3d.lua
And yes, then you can do things like:
$ module list
Currently Loaded Modules:
1) gcc/6.2.0-fw44bd 2) openmpi/2.0.1-s3qbtb 3) openblas/0.2.19-js33um 4) netlib-scalapack/2.0.2-hpqb3d
$ module load netlib-lapack
Lmod is automatically replacing "openblas/0.2.19-js33um" with "netlib-lapack/3.6.1-mirer2"
The following have been reloaded with a version change:
1) netlib-scalapack/2.0.2-hpqb3d => netlib-scalapack/2.0.2-wojunh
$ module load mpich
Lmod is automatically replacing "openmpi/2.0.1-s3qbtb" with "mpich/3.2-5n5xoe"
The following have been reloaded with a version change:
1) netlib-scalapack/2.0.2-wojunh => netlib-scalapack/2.0.2-wnimqh
I came across the folloing on the CMake email list today. My takeaways are:
Maybe we should increase the priority of this kind of comparison project.
Maybe we should review http://spack.io with respect to this statement:
None of the projects have good marketing: it appears they somehow solve similar problems, but none actually have defined the problem or their solution.
The author gives more details that we can/should include up-front:
about if they handle cross compiling (not a common use case but it is yours and mine), what packages they create, what compromises they make, what they expect of my environment...
I have found the following projects which all seem to do some variation of a meta build so that you can build multiple projects that depend on each other and manage dependencies. (there are a couple others that seem to not be maintained as well)
https://gradle.org/ https://bazel.build/ https://github.com/LLNL/spack https://github.com/ruslo/hunter http://www.biicode.com https://conan.io/ https://conda.io/
Unfortunately I have never found anyone who has actually compared even two of these. None of the projects have good marketing: it appears they somehow solve similar problems, but none actually have defined the problem or their solution. It is like everyone assumes that everyone in the world has their exact same problem and the solution is obvious so the only thing left is the details of implementing it. This of course tells me nothing about if they handle cross compiling (not a common use case but it is yours and mine), what packages they create, what compromises they make, what they expect of my environment... These are important questions: I'm pretty sure that I could eliminate several just by comparing my needs to their features.
I'm currently using an in house system that builds everything in a Docker which lets me ensure nobody is accidentally using the wrong compiler. (we cross compile for a x86 target - 90% of the time if you build with gcc for the local system everything will work just fine, the other 10% of the time our system has an imcompatible version of some library and things blow up when you try to use some uncommon feature). I'm thinking about moving to one of the above, but I haven't actually evaluated anything.
If you do evaluate any of the above please document your experience and in particular what is good/bad about the things you look at.
Do you have a link to the OP?
On spack.io: a comparison page would be nice. I don't think it goes on the front page, as if pure not familiar with package management at all, a comparison is meaningless.
Dumping this here in lieu of a better place - following a Twitter thread (cc @tgamblin @boegel )
First, some context: I frequently deal with HPC groups in the UK of various sizes, including occasionally working on cluster deployments (using OpenHPC and commercial alternatives). So far, there has not been an instance where a customer has had a specific interest in either Spack or EasyBuild, but I have experimented with both (and plan to do so in more detail in the future. Basically, this should be read as the view from someone sitting somewhere between "enthusiastic amateur" and "mid-level sysadmin", depending on your frame of reference. Unfortunately I haven't had as much time to spend on hands-on things as I'd like but that is hopefully changing in the next month or so; hence any testing of new tools has had to be somewhat selective until now.
To be blunt, my initial experimentation with both Spack and EasyBuild left a lot to be desired. I did not, admittedly, put much time into it, but using the "quick start" examples for both tools in a fresh VM led to a lot of waiting followed by error messages, and not much else. Without a "hook" to proceed from, I drifted away to doing something else quite quickly, but returned for another attempt a few weeks later. Again, both packages presented some issues which did not inspire much confidence, but I persevered a bit more on this occasion and eventually got a bunch of packages installed using each tool. In neither case, though, did I get all the way to the desired "final" package I was aiming at (WRF, I believe). I am happy to believe this is a result of user error more than anything else given that both tools have a decent uptake and I only gave a couple of hours of my attention to it - but I can't help wonder how many other people have run into the same walls and decided not to bother continuing.
I should point out here that I do 100% see and understand the benefits of these tools, but have some reservations about recommending their usage to a lot of the people I deal with - as mentioned in the Twitter thread, there is probably quite a gulf in experience between the customers I deal with for hands-on tasks and the communities the tools originated in, particularly in the case of Spack. For a very complex package with lots of dependencies, having a method of automating the build process is very welcome. Even for simple packages, I can see the appeal of avoiding duplication of effort. That said, someone somewhere still needs to do the work of defining a package configuration for both Spack and EasyBuild. I (and hopefully a lot of other people) am uncomfortable with delagating my thinking any more than strictly necessary, and also with adding some external party as a dependency to my work. Inevitably, I will want to understand how the packages I need are put together, at least to the level where I can adapt the template myself.
My experience of HPC support staff (where there are any - which might really be the bigger problem eating at me here...) fall into two distinct camps:
Broadly, I concur with the points raised in the comment here: https://ask.cyberinfrastructure.org/t/easybuild-vs-spack-anyone-have-opinions-of-which-is-better-for-bioinformatics/612
I particularly like the final point in that comment about finding, but not necessarily using, a Dockerfile. Container definition files expose the full story to their user, and when written appropriately (that is, not pulling in random zip files from obscure cloud object storage locations as is depressingly common) can easily be used to reverse-engineer a standard bare-metal build script. An argument against container builds which has cropped up is the fact that a portable container is naturally not an optimised one. I totally agree with that, which is why I would generally prefer to work with container definition files rather than the container blobs themselves (the exception being in the cloud - where I would want to define in advance which VM types I'll be running on and build for them, then just copy the container around rather than rebuilding on each instance). The contents of the definition file can then be adapted to include optimisation where available. The broader benefits of containers are sort of coincidental beyond this point, and as other have pointed out both Spack and EasyBuild can be used within container images, so I don't think the topic of containerisation warrants any more detailed discussion here.
Sticking with the topic of optimisation though, I have to wonder if a combinatorial array of package builds (which both package managers seem to describe as a selling point) is really all that beneficial for the end-user; surely only one of those is the optimum for a given workload, and so most of the potential builds are just noise? Being able to easily build a full array of packages and test them would be a nice intellectual exercise, but it seems far more likely to me that people will simply research what build others have determined is the "best", and use that one. After all, the underlying underlying hardware does not actually change particularly quickly - even in a cloud environment.
That point leads me to a general sentiment I get from both tools, which may or may not be fair - namely, both feel like software solutions to a human problem. We are terrible at documenting the things we have done, and do not do enough to share our knowledge in a useful way. Build templates for each tool notionally solve both of these problems, but I'm not convinced the complexity introduced is a good trade-off for the advantages provided.
To conclude, I should probably point out that a good portion of this is devils-advocate-ing; I will definitely be using Spack and/or EasyBuild in the future - life is too short to mess around building packages if I don't need to. It is easy to read the above as a critique of both packages, but what I am really getting at here is: what am I missing?
In particular:
Ultimately, I don't believe there is an optimal solution to HPC package management waiting to be discovered, and so having additional tools can only be a good thing overall. Before embracing either Spack or EasyBuild as a solution though, I would like to better grasp what is being sacrificed - so this can be balanced against the obvious benefits.
I tried EasyBuild before Spack; and for me at that time, it was a night-and-day difference. EB needed a new recipe for each version of a package, whereas Spack is based around this amazing concretizer. I haven't looked at EB for 3 years, so maybe it's different at this time. But at the time I evaluated, I felt Spack was hands-down the better system.
They say that if all you have is a hammer, then everything looks like a nail. My approach in general is to get a hammer, get really good at using it, and then use it to pound any nail if at all possible. I have invested in that hammer called Spack; and therefore, it is pretty much the ONLY way I install software. Unless it is REALLY unsuited for the job (a few MacPorts things, for example; a hammer I've also invested a bit of time into).
Spack provides tremendous efficiency of scale. I can build (and re-build) my system of 100 packages in half a day. That would never be possible by hand. So it changes my view on software building. Traditionally we think of building as a BIG JOB. Now, I think of the concretization --- assembly of different versions --- as the biggest part of the task of assembling software for a project. Actually building it consists of just letting Spack do its thing, and handling the ~5% of packages that have a problem for some reason.
Spack is a tool, it's not magic. You still have to know how to build by hand, because then you'll be able to decipher the log files of the 5% of things that need help. But 95% of the packages "just work" in the build, and fade out of my consciousness.
When something doesn't work and I put effort into figuring out the workaround --- well, I document that. In the old days, we documented it in English; and if I needed to rebuild the same thing again, I would re-read my notes and follow the instructions. I feel that Spack is an automated version of that process. Build instructions are written in Python not English; and they can be executed quickly by a machine, instead of slowly by me. If I really want/need to know what's going on, I can always read the build instructions (eg Spack recipe). Best of all... if someone else finds and fixes a problem, then I might never even know the problem existed. The workaround has already been incorporated in the recipe and it "just works" for me. In this way, I view Spack as a "software building collective" in which we all contribute our little bits of build problem solving while benefitting from everyone else's. On the downside, sometimes others break recipes. But they fix recipes far more often than they break them.
With this change in perspective, pre-building a zillion combinatorial combinations of "base" software makes no sense. Instead, a Spack-based shop would define a set of Spack Environments --- collections of software --- on a per-user or per-group basis. And then keep each environment updated to the needs of that person/group. When someone needs a new package, it gets added to their environment, etc. Each group can get a personalized set of builds in a way that was never possible before. Getting the full benefit from Spack requires this paradigm shift, IMHO.
Another part of this shift is to think about how users will develop their own software / scripts / projects. The most elegant approach is to avoid any differentiation between "3d party" and "inside" software; or between "base" and "user" software; and to just make a Spack recipe for every project built in the organization. Just like you write a CMake file, you also write a Spack package. In my experience, writing a Spack package for my new project isn't any more work than what I'd have to do anyway: assemble an environment or something with the dependencies for my project. But it IS a lot more elegant, and more reusable in the future. In this way, we can finally assemble (and create) stacks of software as deep as we like, with as many dependencies as we like, and Spack takes care of it all.
Another change in paradigm is, what do you do with new infrastructure? Rebuilding all your packages because you got a new compiler (or need a new compiler) isn't so bad. In fact, building a new GCC that you need, instead of using the obsolete GCC that came on your system, is also not so bad. Case in point: I recently took my entire stack (30 packages) that worked with GCC and got it built with Intel, which was needed on our supercomputer. It took me a few days, mostly learning how to set up the Intel compiler properly and getting around a few minor glitches. Now I have my stack working on GCC and Intel.
Before embracing either Spack or EasyBuild as a solution though, I would like to better grasp what is being sacrificed - so this can be balanced against the obvious benefits.
Sacrificed? Like any tool, Spack has a learning curve. But it pays off quickly. Because even modest builds (say, 20 packages) can easily take someone 2 weeks to install manually (I've seen it and I've done it too). If you put 9 days into learning to use Spack and debugging any problems for your builds / needs, then on day 10 you'll be able to build your entire environment with little or no effort. And it will be repeatable.
I guess the only other possible sacrifice with Spack is, it works best if you incorporate "Spack think" throughout your workflow. Use Spack Environments. Build Spack packages for the stuff you're developing. Use environment modules. Fork Spack and put your own customizations on that fork (while trying to get them merged back into develop). When it comes time to use your package, either you'll be loading a lot of modules (say, 100): and searching through those long paths can be slow on some systems. Or you'll be setting up a Spack View that incorporates all of them, which doubles the number of inodes you need; some systems have inode limits.
I can't think of any other sacrifices. Spack is great. I've put significant effort in this hammer, and now it's REALLY good at pounding nails for me --- some of them are nails I never realized needed pounding, because pounding nails without Spack was too hard, slow and tedious.
People who have absolutely no idea what they are doing
I've concluded that, with or without Spack, these people will never be very good at assembling large builds of open source software. They need to stick to clicking on install Wizards on Windows, filling in license keys, and calling tech support for help.