idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.78k stars 1.05k forks source link

Add HDF5 support to default Moose libMesh builds #20033

Closed roystgnr closed 2 years ago

roystgnr commented 2 years ago

Reason

Although in #18768 I was hoping "to avoid foisting another compilation prerequisite on all Moose users", in hindsight it's clear that we at least want distributed Moose binaries to have HDF5 support enabled and Moose CI to be mostly run with HDF5 support enabled, and that's going to require changing our default config scripts.

Design

My proposal:

We configure libMesh with --enable-hdf5 --enable-hdf5-required; for users with HDF5 already in their include and library paths (which includes e.g. anyone loading the hdf5 module on our systems), or for users with its location set in an $HDF5_DIR environment variable, this works. Users building Moose and libMesh manually without HDF5 would have to either install it or would have to tack on --disable-hdf5-required to their update_and_rebuild_libmesh.sh runs.

Logan's proposal:

We first configure PETSc with --download-hdf5, and point libMesh to the downloaded build. For users without an existing HDF5 installation in their paths, this works. Users with an HDF5 already in their paths (because it's preinstalled in their environment, or because it's a prerequisite for some software they're already using) would have to make sure that the two versions never conflict. Somehow.

Impact

Long-term:

With either proposal, most users will get access to HDF5-dependent capabilities, most importantly the ability to use ExodusII v8 including IsoGeometric Analysis extensions.

With my proposal, some users who manually build Moose will have to do more work (usually easy work like loading a module, sometimes harder work like building HDF5 manually or at least learning the invocation to disable it). With Logan's proposal, users who want to build Moose while already having their own HDF5 accessible may run into link-time or run-time conflicts if a build system doesn't prioritize the PETSc HDF5, and users who want to build Moose-derived apps against their own HDF5 will have to manually rebuild the whole PETSc/libMesh/Moose stack first.

Short-term:

Hopefully not too much of a flame war? Pinging @loganharbour @permcody and @milljm to discuss here; bring in anyone else who might be interested.

milljm commented 2 years ago

For our end-user Conda experience, I would like to propose we no longer perform --download-hdf5, and instead use --with-hdf5. Meaning we would now be installing conda-forge's hdf5, and have both PETSc and libMesh use that package. Thus keeping it identical.

Being that moose-libmesh depends on moose-petsc, any version installed after-the-fact, should still match moving forwards; User installs moose-petsc, and then later install moose-libmesh.

However, how we make our ./update_andrebuild$$$$ scripts presence-aware of conda-forge's hdf5 package when its available...

Perhaps a change to our moose-mpich package is necessary instead (add an hdf5 dependency in this package, and thus control an environment variable we can set ourselves). This may be required for the folk who build their own libMesh and PETSc, using our moose-mpich package.

fdkong commented 2 years ago

Meaning we would now be installing conda-forge's hdf5, and have both PETSc and libMesh use that package.

I love to see a parallel hdf5. Conda hdf5 does not work in parallel because we have our own mpi

fdkong commented 2 years ago

I like Logan's proposal better. Letting users manually build something is highly nontrivial.

Another solution would be to add hdf5 to libmesh "contrib".

Regarding conflicts, it might be fine because we are already handling the same issue for other third-party packages.

Having a clean build environment is much easier than building hdf5 from scratch.

fdkong commented 2 years ago

I would like to propose we no longer perform --download-hdf5

Any motivation? Is it too slow? I guess the petsc configuration will be slow because hdf5 is large

milljm commented 2 years ago

I would like to propose we no longer perform --download-hdf5

Any motivation? Is it too slow? I guess the petsc configuration will be slow because hdf5 is large

My understanding is that hdf5 has to be the same throughout the entire stack? If that is the case, then PETSc configure will need to use --with-hdf5 instead of --download-hdf5.

We would need to point PETSc to an already-implemented-hdf5 using --with-hdf5. We would then need to do the same with libMesh (equivalent --with-hdf5 pointing to the same implementation).

If we cannot use conda-forge's implementation of hdf5 we will need to create it ourselves (moose-hdf5). Both moose-petsc and moose-libmesh will depend on this package.

This has the advantage of freeing up moose-mpich as my proposed package for handling the hdf5 library. Allowing folks to continue and use moose-mpich but not be married to a specific version of hdf5. Allowing those types of users to implement their own (or perhaps with a new moose/scripts/update_and_rebuild_hdf5.sh script for those wishing to use these scripts?).

roystgnr commented 2 years ago

My understanding is that hdf5 has to be the same throughout the entire stack?

In general this is correct: same version and same configuration. IIRC I've run into problems before just by inadvertently trying to link a parallel build and a serial build of the same HDF5 version into the same final program. This is my primary objection to just getting a PETSc-specific copy into the stack.

If we cannot use conda-forge's implementation of hdf5 we will need to create it ourselves (moose-hdf5). Both moose-petsc and moose-libmesh will depend on this package.

This would sound ideal to me, except ... don't our conda environments already bring in their hdf5 package? I seem to remember it being an indirect dependency of some python package the moose environment depends on. It'll be tricky to avoid conflicts between hdf5 and moose-hdf5 if we have to have both loaded.

milljm commented 2 years ago

This would sound ideal to me, except ... don't our conda environments already bring in their hdf5 package? I seem to remember it being an indirect dependency of some python package the moose environment depends on. It'll be tricky to avoid conflicts between hdf5 and moose-hdf5 if we have to have both loaded.

Indeed hdf5 gets pulled in from another package (VTK if I recall correctly, which is a moose-tools dependency). Can we control what HDF5 libMesh uses? Since moose-hdf5 would be ours, we can set any environment variable(s) we need.

roystgnr commented 2 years ago

libMesh looks for an $HDF5_DIR environment variable, or lets you set --with-hdf5=some/dir/, and then looks for include/ and lib/ dirs under that, and adds those to its include and lib paths. If neither is set it'll work as long as HDF5 headers and libraries are already in your compiler's paths. We could make that even more flexible if necessary.

But that's probably not good enough here. VTK is also a libMesh dependency. Optional, but not something we'll want to disable. If we try to link to HDF5, and also to a VTK library linked to a different HDF5, we're certain to get conflicts.

Is there no VTK module available that links to a parallel HDF5? VTK is an MPI-aware library; it seems odd that they'd want to link to a serial HDF5 even if they didn't need parallel HDF5 functionality themselves.

milljm commented 2 years ago

Is there no VTK module available that links to a parallel HDF5? VTK is an MPI-aware library; it seems odd that they'd want to link to a serial HDF5 even if they didn't need parallel HDF5 functionality themselves.

We do have a moose-libmesh-vtk module, strictly for use when enabling VTK support (with MPI_BOOL=on). IIRC we did this only for the file writer object (@jwpeterson might remember). I mention this, because we should still be able to control HDF5 since so far, all the Conda packages I have mentioned are ours.

Just verified: our moose-libmesh-vtk package does not require HDF5.

The VTK moose-tools package dependency I mentioned earlier is for Peacock only. We were unwilling to link libMesh to a full-installation of VTK. It was very slow (fat binaries, or whatever you guys call them).

roystgnr commented 2 years ago

Hmm... maybe we'd want to tell libMesh --disable-vtk when we're not explicitly pointing libMesh to our own VTK, to make sure we don't autodetect another VTK install? Other than that, though, I no longer see any problem with just building our own HDF5 module alongside others'.

roystgnr commented 2 years ago

Anyone feel free to correct me if I'm wrong, but from the Teams meeting, we seem to have settled on:

  1. Have update_and_rebuild_petsc.sh manually search include paths (and $HDF5_DIR type environment variables?) for hdf5.h, and if we find them then tell PETSc and Moose to use them; if we don't then tell PETSc --download-hdf5. So when we have a conda module or spack module or even a system with hdf5 preinstalled, we use that as we should, but when a users is building from scratch they still have the highest possible likelihood of ending up with an HDF5-enabled build of everything.

@cticenhour is getting started on the PETSc build script logic here.

  1. Have libMesh hdf5.m4 only get invoked after petsc.m4, and have it add PETSc include and lib flags to its path if we have PETSc enabled, so that if a user has done update_and_rebuild_petsc.sh or manually configured PETSc --download-hdf5 and obtained an HDF5 build that way then all libMesh will need is --enable-hdf5 to find it, we don't need extra logic to somehow set $HDF5_DIR or --with-hdf5=/some/petsc/dir.

I'll get started on this.

  1. Ensure that the hdf5 modules we want to use in spack/conda/whatever are getting their include and library paths in to our compilers' environment variables, so that if we --enable-hdf5 without --with-hdf5 we'll still pick those up too.

I think @milljm was already starting to look at this? It should probably be sufficient to just load up each environment, manually run libMesh configure --enable-hdf5 --enable-hdf5-required, and check that configure succeeds rather than screaming and dying. I guess we could be extra careful and actually run make afterwards just to be sure there aren't two hdf5 versions hiding somewhere waiting to fight with each other at link time.

Finally, though we didn't really spell this out in the meeting, when that's all done it should be safe to:

  1. --enable-hdf5-required in our update_and_rebuild_libmesh script, so that we don't regress and accidentally return to doing CI or distributing binaries without HDF5 enabled. Users who really don't want HDF5 (they don't have it preinstalled, and they have PETSc preinstalled without it, and it's against their religion?) would still be able to --disable-hdf5-required manually, but for the most part every Moose user would now get it.
milljm commented 2 years ago

Because the conda packages now run the same configure script that ./update_and_rebuild_petsc.sh uses... there is little for me to do.

That is why I was asking if y'all wanted an HDF5_DIR variable when someone installs and activates the moose-petsc conda package.

roystgnr commented 2 years ago

This was fixed by #20580

cticenhour commented 2 years ago

Thanks @roystgnr - forgot to go back and do this!