Closed maarten-ic closed 1 year ago
Note: documentation https://muscle3.readthedocs.io/en/latest/installing.html should also be updated to include a note or warning against combining libmuscle
and libmuscle_mpi
I think I would prefer having different symbol names, so that this fails at link time rather than at runtime. If it doesn't get linked correctly, then anything can happen at runtime and that's a little too shaky.
For C++, we could rename the libmuscle::impl
namespace to libmuscle::impl_mpi
for the MPI version of the library, that should give an error message that hints at what's wrong. This will still leave the symbols outside of Instance
duplicated (e.g. Message
and Data
), so that linking against both libraries could still get them confused, but they don't use MPI so they're identical and that should hopefully work anyway. I don't think there's much else we can do at link time.
For Fortran, we could achieve a similar result by changing the names of the exported C functions; those names are hidden from the user so it won't change the API, but they are what gets linked against, so you get an error about LIBMUSCLE_MPI_Instance_create_
not being found and that should give a hint.
When a C++ or Fortran program is targeting the
mpi
version of libmuscle, but links against both the mpi and regular shared library, SEGFAULTS or other undefined behaviour may happen.When linking with
-lmuscle -lmuscle_mpi
(in that order!), the following happens:MPI_Comm
androot
arguments) is only provided by thempi
library, so the muscle instance is created by thempi
enabled version of the librarylibmuscle.so
andlibmuscle_mpi.so
provide an implementation. Since-lmuscle
was provided first when linking, the non-MPI implementation provided bylibmuscle.so
is used! Since the non-MPI library has a different memory layout of the Instance object (it doesn't have the MPI members), this will likely trigger a SEGFAULT, undefined behaviour, or more subtle bugs.Suggested solution: