libMesh / libmesh

libMesh github repository
http://libmesh.github.io
GNU Lesser General Public License v2.1
654 stars 287 forks source link

No fully-compatible MPI implementations on Ubuntu 22.04 LTS #3328

Closed roystgnr closed 4 months ago

roystgnr commented 2 years ago

OpenMPI has a fix for their regression currently posted at https://github.com/open-mpi/ompi/pull/10527, and MPICH seems to already have their git head working again, but in the meantime, not being able to do an MPI_MIN with an arbitrarily large unsigned long is a big deal for us. At the very least it makes DistributedMesh unusable with METHOD=dbg (with our git head I get failures from the semiverify asserted at src/mesh/mesh_tools.C, line 2164), and there's no telling where else we might be triggering the same problem.

I'm currently working around the problem myself with a --download-mpich PETSc build (which pulls MPICH 3.4.1 right now; that seems to predate the regression there), just posting a warning for others.

jwpeterson commented 4 months ago

Just a small follow-up comment on this, since it led to some recent confusion for us. The Ubuntu 22 libmpich-dev package which is labeled "4.0-3"

Package: libmpich-dev
Architecture: amd64
Version: 4.0-3
Multi-Arch: same
Priority: extra
Section: universe/libdevel
Source: mpich
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian Science Maintainers <debian-science-maintainers@lists.alioth.debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 58262
Depends: gfortran | fortran-compiler, g++, libmpich12 (= 4.0-3), gfortran-11 | gfortran-mod-15, mpich (= 4.0-3)
Filename: pool/universe/m/mpich/libmpich-dev_4.0-3_amd64.deb
Size: 7374790
MD5sum: 73f445f5d0d674ff6b653ffd2f5cf85d
SHA1: 2119398701c2842a8a4f2ca96ad591e08b05e648
SHA256: bfe7155963cef03a800746913b51e8e5489d18cedf5950a2e6b86ab0e03500d3
SHA512: ee551ce69c59039820dca26681b56ee2a0c0b20c9c2fb20cd83496606cc358d03b8832f750197e09bc93cad5cc5b37c1f6a43e2f4dd6c11773ca450fb2899543

is not related to "mpich-4.0.3" which was released much later in 2022. After some brief testing, we have observed that the mpich-4.0.3 release from mpich.org passes all the current timpi testing, and should be OK to use with libmesh, while Ubuntu's libmpich-dev "4.0-3" package, which was the original subject of this issue, should not be used.

Also, @roystgnr is there anything else we should do before possibly just closing this issue? I have not done any testing with OpenMPI, so can't comment intelligently on that.

roystgnr commented 4 months ago

For me (for now; I'm about to just upgrade to 24.04 LTS) an apt-cache show libopenmpi-dev still shows "version 4.1.2-2ubuntu1" - I could see if they backported any patches, but I can't imagine why they wouldn't just update to 4.1.5 if they were maintaining things that closely; it's a subminor bugfix release after all.

I guess we can mark this as closed, though, since it's not our issue and since there's an easy-enough workaround (switch from openmpi to mpich) available now.