Open bangerth opened 2 years ago
That is indeed a problem I ran into in the past. The MPI standard gives the implementations the freedom to implement MPI handles in whichever way they want. Since C++ is not officially supported by MPI, this isn't even a quality of implementation question.
Your suggestion won't work either because ompi_mpi_unsigned
is an external symbol that is not a compile-time constant. For Open MPI to support constexpr
object handles it would have to completely change its ABI (probably not gonna happen unless some higher entity forces it to).
One possible solution would be wrap MPI types in your types that are constexpr
and delay access to the underlying MPI type until execution time (probably needs some mapping function that can be inlined, not sure). That might not work with derived types, in case you need them.
You might want to add your wish to the list in https://github.com/mpi-forum/mpi-issues/issues/288 (if it's not already there). The MPI Forum is discussing a future C++ interface and I agree that support for constexpr
would be nice (although I don't like the repercussions it might have on implementations).
@devreal Actually, I think that this should still work. ompi_mpi_unsigned
is an external variable whose contents are unknown in the header file and that consequently can't be accessed in a constexpr
variable. But MPI_UNSIGNED
only refers to the address of the object, and that is (at least as far as I know) a constexpr
.
Indeed, this works:
struct S { int id; };
extern struct S internal_MPI_UNSIGNED;
extern struct S internal_MPI_INT;
constexpr S* MPI_UNSIGNED = &internal_MPI_UNSIGNED;
constexpr S* MPI_INT = &internal_MPI_INT;
constexpr S* get_type_id ()
{
return MPI_UNSIGNED;
}
int main()
{
return (get_type_id() == MPI_INT);
}
// ---------- usually put into a different file --------
struct S internal_MPI_UNSIGNED = {13};
struct S internal_MPI_INT = {42};
I think that the root cause is really just the static_cast<void*>
where you're doing type-punning because you want the address of the first member of the struct
.
That's surprising. I stand corrected :) The cast is needed because of the padding put into predefined types in order to keep the ABI stable. Exposing the implementation of ompi_predefined_datatype_t
(and necessarily ompi_datatype_t
) defeats the purpose. I cannot quite make sense of the explanation given in https://github.com/open-mpi/ompi/blob/master/ompi/communicator/communicator.h#L349 as to why the padding is necessary in the first place, will need to understand that first.
I think the cast isn't actually for the padding, but because you define MPI_UNSIGNED
as a pointer to a member of a structure of a different type. You happen to know that the member is at offset zero, so their addresses are the same, but have different types, and the double cast is simply a way to do type punning of the pointer.
What I don't understand is why MPI_UNSIGNED
isn't just defined as &ompi_mpi_unsigned.dt
. That has the right type already, and because the ompi_predefined_datatype_t
is declared in the header file, everything you need is actually available in the place. It's the same address, and doesn't need the type punning. That is, something of the form
#define MPI_UNSIGNED ompi_mpi_unsigned.dt.
@bangerth my 0.02US$
shouldn't it be instead (note the &
) ?
#define MPI_UNSIGNED &ompi_mpi_unsigned.dt
Anyway, internally, ompi_datatype_t
and ompi_predefined_datatype_t
are fully defined.
However, from an end user point of view (e.g. #include <mpi.h>
), only (pointers to an ) unspecified struct
are used, so the dt
field is not exposed to the end users.
Yes, it should have been the address.
After talking to @bosilca on why we have separate types for predefined MPI handles, here is my take-away: the internal data structures of Open MPI (ompi_datatype_t
being among them) may change when Open MPI is compiled with different options (debug probably the most likely culprit). ompi_mpi_unsigned
is defined as an external symbol so the linker records its size at link time and the loader compares that size to the actual size of the object in the loaded shared library. If you link against one installation of Open MPI and then run with another installation (one that has debugging enabled) the loader will complain because the size changed and accessing ompi_mpi_unsigned
directly could result in weirdness. The linker doesn't know that the application will never actually access ompi_mpi_unsigned
directly and is only interested in the pointer to it so it warns if the size of the object changes. The solution in Open MPI is to provide a padded wrapper type that is ABI stable (ompi_predefined_datatype_t
).
We cannot expose the definition of ompi_predefined_datatype_t
in the public header file since that would require us to also expose ompi_datatype_t
(as @ggouaillardet mentioned). That would get us back to square 1.
This is a specific problem with predefined MPI handles and the fact that handles in Open MPI are pointers to objects. I don't have a good solution at hand but I agree that it would be nice to eventually have constexpr
handles.
@bangerth MPI handles are not specified to be compile-time constants, only link-time constants, so you cannot use constexpr
on them, regardless of how they are implemented in Open-MPI, without making non-standard assumptions.
2.5.4 Named Constants
All named constants, with the exceptions noted below for Fortran, can be used in initialization expressions or assignments, but not necessarily in array declarations or as labels in C
switch
or Fortranselect
/case
statements. This implies named constants to be link-time but not necessarily compile-time constants.
I was trying to write a little generic C++ routine that translates the data type of a variable into the corresponding
MPI_Datatype
value. Something of this sort (the specific details don't actually matter, but you get the idea):The problem is that at least some compilers don't allow making the variable (and the query functions)
constexpr
because, as I found out to my surprise, the definitions of things such asMPI_CHAR
are not in factconstexpr
essions. I get compiler errors likeor
Upon digging around, here's what I could find: We have
and then
and
and
and
So, if I see this right, then this all together yields
So at least that second error I show above is caused by the
void*
cast. I really can't quite make heads and tails of the first compiler's error message.Obviously, none of my desire to be able to do things as
constexpr
is backed by the MPI standard, which has nothing to say about the issue. But as a matter of quality of implementation, it would nonetheless be nice if one could do this. I believe that that wouldn't actually be all that hard: The two casts here really just extract a pointer to thedt
sub-object of theompi_predefined_datatype_t
structure, so the following alternative definition ofMPI_UNSIGNED
should in principle suffice:I suspect that I'm missing major considerations here, though :-)