Closed rohany closed 1 year ago
It seems like we should have separate conda files for this kind of use case, where more important packages (compilers, MPI etc.) are provided by the platform.
https://github.com/nv-legate/legate.core/pull/367 should be taking care of this.
Additionally, on these systems, the CMake build does not properly reference the MPI compilers (mpicc and mpicxx) when building Legion's GASNet or legate/cuNumeric
It would be good to understand what is going wrong here; the embedded gasnet build should be using mpicc directly.
I would actually expect legate.core to also be using mpicc (since we're compiling code that uses MPI, core/comm/comm.cc
in particular), but weirdly I don't see mpicc being used at all, not even linking against -lmpi
. Out of curiosity, @jjwilke @trxcllnt do you know how this is working?
All the mpicc
wrapper does is call the underlying compiler with some flags specifically for code that uses MPI. It looks like cmake doesn't use mpicc
directly, but instead add the required flags to its compiler invocations directly.
In any case, it looks like this works as expected on clusters if we don't install the "compilers" packages from conda.
Installing OpenMPI (or any packages that override system compilers https://github.com/nv-legate/cunumeric/issues/629) result in various hard to track down build issues on supercomputers like Lassen and PizDaint where the software stack is carefully managed. Additionally, on these systems, the CMake build does not properly reference the MPI compilers (
mpicc
andmpicxx
) when building Legion's GASNet or legate/cuNumeric, resulting in various errors like "cannot compile MPI programs" during configures, or<mpi.h> not found
.It seems like we should have separate conda files for this kind of use case, where more important packages (compilers, MPI etc.) are provided by the platform.