Open jeffhammond opened 1 year ago
Unfortunately, the above proposal appears to involve UB in C if we use magic values for built-ins, because casting (u)intptr_t
to and from pointers is only defined for valid pointers (see below).
As best I can tell, this means that we must cannot have compile-time constants or magic integer values for built-in handles if they also behave like C pointers.
Implementations can still choose between a fully integer handle design like MPICH and a pointer design like Open-MPI, but there is no "best of both" situation.
7.20.1.4 Integer types capable of holding object pointers
The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
intptr_t
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
uintptr_t
These types are optional.
It's sad we can't get compile-time type checking without making MPICH-using code hold invalid pointers. If we had overcommit and 64-bit addresses, then we could just allocate/map 4 GB of memory and never access it (basically free on 64-bit systems) to make all MPICH-style handles valid pointers. Are there architectures of interest for which (a) holding an invalid pointer is actually problematic and (b) the mapping strategy to make the pointers valid is problematic? A desperate pitch, but I have to ask.
To get compile time constants, we'd need to mmap FIXED, which isn't particularly portable.
The proposed design is problematic on 32b platforms and likely has issues with an OS like Blue Gene CNK.
I haven't figured out the major downside of link-time constants...
I wonder: why can't we encode compile-time constant values in pointers and check for them (or the known range of them) inside the MPI library? They will never be dereferenced but replaced with the actual pointer upon entry into the library.
I wonder: why can't we encode compile-time constant values in pointers and check for them (or the known range of them) inside the MPI library? They will never be dereferenced but replaced with the actual pointer upon entry into the library.
Because it's UB in C and C++, as best I can tell.
In practice, there are unlikely to be any values that are universal anyways. People who try such things have problems, e.g.
Here's what I'm reading in C11 draft N1570 §6.6 right now:
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following: — an arithmetic constant expression, — a null pointer constant, — an address constant, or — an address constant for a complete object type plus or minus an integer constant expression.
...
An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type. The array-subscript [] and member-access . and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant, but the value of an object shall not be accessed by use of these operators.
If we want an address constant, it has to be a pointer to something. It cannot be any random integer.
You can have a normal mapping and define pointer_handle = base + integer_handle
. Where base
can be a private pointer to struct { char c; }
for internal arithmetic in char-sized units.
I suspect this mapping approach is DoA due to 32-bit address spaces and environments without overcommit, though maybe creating the out of bounds pointer value is not a real problem there.
There might be other ways to encode with modest memory requirements.
I'm not sure I understand the problem. Does it means MAP_FAILED
is wrong in mmap
?
On success, mmap() returns a pointer to the mapped area. On error, the value
MAP_FAILED (that is, (void *) -1) is returned, and errno is set to indicate the
cause of the error.
Yeah, same issue with MPI_IN_PLACE
, at least as defined by Open MPI and MPICH. Note that mmap
is POSIX, not ISO C, and this is another interaction:
Existing implementations of mmap() return the value -1 when unsuccessful. Since the
casting of this value to type void * cannot be guaranteed by the ISO C standard to
be distinct from a successful value, this volume of POSIX.1‐2017 defines the symbol
MAP_FAILED, which a conforming implementation does not return as the result of a
successful call.
It's also common for malloc(0)
to return a unique non-NULL pointer that will SEGV if you dereference it.
I recall Bill commenting (in the context of PETSc multi-dim array pointer tricks) that the ISO C restriction only matters for systems with segmented memory. The standard doesn't have advice for implementers regarding MPI_IN_PLACE
. An implementation could stay ISO C compliant by by making it a link-time constant pointing to a constant dummy object, but it seems none do.
Ok today I learn something new thanks! 🤯
So, as you pointed out, and if I’m not mistaking: it means it is already impossible to implement both MPI_BOTTOM
and MPI_IN_PLACE
.
If MPI already ignores, and if POSIX also deemed it ‘ok’… I’m not sure there are many handles needing multiple non standard values (mostly null handles which can be NULL) because otherwise it could be pointer to something at runtime as given by constructors.
Non exhaustively out of my head:
Maybe the MPI_STATUS_*
but they are all roughly synonyms to NULL (if you ignore error checking).
And the comm : NULL, WORLD and SELF. Sessions already solved the construction of these. But yes, can we leave with 0, 1, 2 (or whatever) in these pointer values.
The group: GROUP_NULL and GROUP_EMPTY (no constructor for empty).
Other enums are passed as integers otherwise. (Plans to typedef ?)
And… all the datatypes which could be an array unfolded by base indexing with a macro as stated earlier in the issue. Sorry missed the largest part !
MPI_BOTTOM and MPI_IN_PLACE are not specified to be compile-time constants. They are sentinels and explicitly prohibited as initializers.
Offsets relative to bottom are probably the most sketchy aspect of MPI from a C language compliance perspective, although we added AINT arithmetic macros in MPI-3 to address this.
I recommend everybody read 2.5.4 slowly and carefully a few times, since it addresses this stuff. Learning from implementations is not a great way to understand this topic.
Implementations will have to change, particularly macros that manipulate handles.
If we provide the MPI ABI interface in mpi_abi.h
instead of mpi.h
then implementations don't need to change.
OpenMPI can make mpi_abi.h
just a link to mpi.h
, and MPICH can provide stubs in mpi_abi.h
that convert from/to what the APIs in mpi.h
do.
My hope is that we don't have to do that. Let's see how it goes. Maybe we'll unite the clans around a single ABI top-to-bottom.
I have made a small script to run code against the various implementations with Spack https://github.com/besnardjb/mpiscan, and up to where I could go back, nobody ever cared about this ISO C constraint when looking at MPI_BOTTOM
and MPI_IN_PLACE
(see data.md). Therefore, and I say it in a very candid manner, isn't it over-specification?
No, it is not over-specification. The standard must not violate ISO C. Implementations may be able to get away with it, because they support a subset of all platforms that exist.
Recently, Open-MPI wrote some code that included UB and the results were not good. It had not been a problem before but GCC 11 (?) decided to deliver the nasal demons previously promised to those who create UB. We need a future-proof design where C compilers enforce every letter of the standard and punish every instance of UB.
If you can find a proper C language lawyer who says that storing invalid non-NULL addresses to pointers is legal and not UB, we can have a different discussion, but so far I'm not convinced it is safe.
Interesting observation: Open MPI implements MPI_UNWEIGHTED
, MPI_WEIGHTS_EMPTY
, and MPI_IN_PLACE
as non-NULL pointer constants and many other predefined handles (MPI_COMM_WORLD
, MPI_REQUEST_NULL
, etc) as extern variables.
MPICH implements MPI_IN_PLACE
as non-NULL pointer constant but MPI_UNWEIGHTED
and MPI_WEIGHTS_EMPTY
as extern pointer variables (and most other predefined handles as integer constants).
So both implementations seem to run afoul of the C standard. I agree with @jeffhammond that the MPI API should not roll the UB dice and simply hope for the best. Just because compilers haven't found a way to screw us over this yet doesn't mean somebody smart won't come up with a way in the future.
I see and understand your point: from a standard point of view, looking for the safe harbor of other standards is the smart way to go. However, it is clear that if the compiler enforced this, the user would probably run into something approaching busybox in a very constrained env. I am just trying to be halfway between a desktop Linux and a Mars rover.
Also, what if we did not translate the intptr_t
with magics to void *
-- only valid addresses ?
On my arch intptr_t
is just a sufficiently large integer type for addresses.
#include <stdio.h>
#include <stdint.h>
#define SPECIAL 0x32
int main(int argc, char** argv)
{
intptr_t myptr = SPECIAL;
int special = 0;
void * ptr = NULL;
switch(myptr)
{
case SPECIAL:
special = 1;
break;
default:
ptr = (void*)myptr;
}
printf("SPECIAL %d PTR %p\n", special, ptr);
return 0;
}
((int *) 2)
could easily be replaced with link-time constants that are valid pointers (to private byte-sized objects).~https://github.com/mpiwg-abi/abi-issues/wiki/Valid-Pointers-and-related#converting-a-pointer-to-integer-or-integer-to-pointer captures https://wiki.sei.cmu.edu/confluence/display/c/INT36-C.+Converting+a+pointer+to+integer+or+integer+to+pointer, which is pretty clear that we cannot store magic values in pointers "while strictly conforming to ISO C."~ This is fine.
The good news is that this pattern is only IB, not UB, so in theory we could insist that everyone who implements MPI assume their platform IB allows this.
@gonzalobg points out that - assuming we go with incomplete struct pointers - we need the "private" names to be standardized, because e.g. Rust language bindings require this, and the name can be made visible in C++ using e.g. std::remove_ptr_t<decltype(MPI_Comm);
.
Thus, the MPI ABI spec needs to say:
struct MPI_ABI_Comm; // declaration required by ABI spec
typedef MPI_ABI_Comm* MPI_Comm; // definition requires by ABI spec
instead of
typedef implementation-defined MPI_Comm;
Sure, that's no problem.
@jeffhammond wrote on Dec 14, 2022
MPI_BOTTOM and MPI_IN_PLACE are not specified to be compile-time constants. They are sentinels and explicitly prohibited as initializers.
As far as I understand in 2.5'4, in C, all MPI constants can be used as initializers, i.e., must be at least link-time constants. These two constants need not to be compile-time constants. Correct?
Yes, I was confused. One can initialize a C static global to MPI_BOTTOM
and thus it cannot be a constant that is defined during MPI_INIT
.
In any case, for the standard ABI, everything is a compile-time constant, and many of the constants can have the same value in both C and Fortran.
Where can I find the ABI discussion on Fortran handles in the three MPI Fortran support methods?
Handle types in Fortran are fixed in the current API. All we can do is ensure predefined constants can fit and thus have no conversion overhead.
I'm planning to propose new modules that support the full ABI to see if the forum has an appetite for these.
I haven't had time to write down all the Fortran details but I have them in my head.
@jeffhammond wrote
I'm planning to propose new modules that support the full ABI to see if the forum has an appetite for these.
Please have in mind that a major design feature of mpi_f08 is, that the handle values in the Integer in the mpi_f08 handle structure are identical to those in the both old Fortran support methods. This feature and the availability of the new handle types in the old mpi module and mpif.h allow, that all Fartran support methods can be used together in an MPI program without any conversion problems.
One can initialize a C static global
A note here because we recently had to deal with it in rsmpi and I'm not sure this semantic is widely recognized. The MPICH definition of MPI_UNWEIGTED
as a link-time constant cannot be used to initialize an object with static
storage duration:
extern int * const MPI_UNWEIGHTED MPICH_API_PUBLIC;
Meanwhile, OMPI uses a macro defined to a pointer constant:
#define MPI_UNWEIGHTED ((int *) 2) /* unweighted graph */
So we can't access a symbol from OMPI and can't use the MPICH value as an initializer, so we had to add this function to get the value from Rust FFI. I expect other language bindings are in the same boat.
int *RSMPI_UNWEIGHTED() {
return MPI_UNWEIGHTED;
}
The function could be replaced by a mutable global, but then we'd need to have a language binding initializer. Of course we could cope with either one of these semantics without needing to turn the constant into a function so this isn't a constraint on the ABI project.
Jed: yes that came up when doing Mukautuva. It's possible to satisfy static global initialization with link time because OMPI does for eg datatypes. But the ABI will endure compile time plus symbols in SO even though not from header.
Rolf: MPI_VAL is INTEGER in F08 handles. Can't change that without breaking something. For ABI, need it to be kind=c_intptr_t to eliminate all conversions. Hence, new module. Squyres and I have been discussing for a long time.
The bigger question is if we make MPI "F90" module for ABI with handle that is INTEGER(kind=c_intptr_t). Probably not but it might get adopted in codes that use -i8, even though that's unsafe.
Indeed, though there's a difference between MPICH MPI_UNWEIGHTED
, which is a const
object (produces a symbol in the library), and OMPI MPI_DOUBLE
, which is a macro that expands to the address of a thing in the library.
For Fortran, what is the downside of using kind=c_intptr_t
?
@jeffhammond wrote
Rolf: MPI_VAL is INTEGER in F08 handles. Can't change that without breaking something. For ABI, need it to be kind=c_intptr_t to eliminate all conversions. Hence, new module. Squyres and I have been discussing for a long time.
There is already no conversion between mpi module (and mpif.h) and the mpi_f08 module. This was decided that one can easily mix subroutines written with the mpi module (and mif.h) and new subroutines written with mpi_f08.
A) If such applications should switch to a new mpi_f08-ABI module, this only works if the integer values of this new module still fit into the INTEGER handles of the mpi module. If the conversion between C and Fortran does not change the values then there is only the remaining integer 4 byte to 8 byte conversion, which is done by the language for call by value in C. Care is only needed for jandle output arguments and array of handles arguments. Then I do not see the need of a new mpi_f08-ABI module.
B) If you propose to allow different (generated!) handle values for C handles and the INTEGER handles in the MPI module, and you want to use the C handle values in your new mpi_f08-ABI module, then the major design goal of the new Fortran module “no handle value conversion between old and new Fortan support method“ is completely lost.
In which direction do you want to go? A or B?
It made a lot of sense to provide compatibility between old and F08 Fortran support in MPI 3.0 in 2012. How many people are using it? What I observe is, MPI F08 is only used by modern Fortran codes that do not care about the old way and thus don't need conversions.
However, we do not actually lose backwards compatibility here. If we have a new set of F08 handles that are exactly compatible with C, then anyone wishing to convert to the old Fortran handles can use f2c and c2f routines originally supported in C.
For Fortran, what is the downside of using
kind=c_intptr_t
?
It is incompatible with the definition we have right now, where MPI_VAL
is INTEGER
, which is usually equivalent to int32_t
.
We currently support code like this:
INTEGER :: f77comm
f77comm = MPI_COMM_WORLD % MPI_VAL
call OLD_MPIF_DOT_H_LIBRARY(f77comm, ...)
This code breaks if MPI_VAL
isn't INTEGER
.
If people write new code in Fortran with mpi_f08, and use older existing code in the same application, e.g. some parallelized libraries like PETSc, that use the mpi module, and nowhere using C, then the ABI proposal does not work.
Question: does OpenMPI uses for generated handles, e.g. request handles, address pointers pointing to a real structure in memory, or are these generated handles integer numbers converted into a pointer?
Do you have an example of an existing MPI F08 code that depends on backwards compatibility of MPI_VAL? That is important to know before digging trenches.
I'm.teaching MPI-3.0 and higher to all Fortran programmers in my MPI course for many years, see the pdf file of https://www.hlrs.de/training/self-study-materials/mpi-course-material
Please look at course chapter 5, and especially slide 148. As usual with such courses, the participants use features they leatnt, but they do not report back to their trainers, what they do. This course material is used by several trainers in Germany, Austria, Switzerland, and some other EU countries. The slides exavtly reflect the goals that we had with mpi_f08.
Question: does OpenMPI uses for generated handles, e.g. request handles, address pointers pointing to a real structure in memory, or are these generated handles integer numbers converted into a pointer?
https://github.com/open-mpi/ompi/blob/main/ompi/mpi/c/comm_f2c.c and any other code that references opal_pointer_array_get_item
shows the conversion from Fortran to C. The C to Fortran conversion is trivial because the C object holds the Fortran handle value.
The comm_c2f.c more directly gave the answer: yes, all OpenMPI handles are real C pointers pointing to a structure. This includes MPI_COMM_WORLD.
1) How does this fit to your goal that in C, MPI_COMM_WORLD should be a fixed compile-time constant?
2) If should be something like ((MPI_Comm) 2), is it then possible that all handles are converted int values? (provided the C standard is extended to allow this)
3) If you want to provide a new module mpi_f08-ABI, what do you plan with existing mpi.h, mpif.h, and mpi and mpi_f08 modules? What should coexist in an ABI-conform MPI installation?
I'm sorry that you have not attended any of the discussions of the ABI design process, because 1 and 2 have been answered already. The proposed standard ABI is neither the MPICH nor OMPI ABI, and implementations of it may require handle translation. MPICH has implemented this and proven it efficient (zero overhead in 8-byte message rate benchmarks in shared memory). There are three implementations of ABI translation on top of MPI libraries, including https://github.com/jeffhammond/mukautuva, which show how this can be done externally. The currently working proposal for handle constants can be obtained from https://github.com/mpiwg-abi/specification-text-draft/blob/main/print-handle-constants.py.
Rather than trying to paint me into a corner on all matters related to Fortran, why don't you propose what makes sense to you, and we can see if it's compatible with the current direction for Fortran, which is rather fluid.
Note that, since we deprecated mpif.h
, it is no longer relevant to MPI Forum business.
@jeffhammond I do not want to put you or somebody else in a corner. I do not understand enough and therefore I try to ask specific questions that may help you and me to solve the Fortran ABI problems. I'm on vacation and retired and will surely not have the time to propose a solution. You told me that the existing doc does not show the Fortran ABI solution. The discussion looks like that future MPI implementations include
All correct?
If yes then why should aFortran user move to the new ABI module?
If no, what is wrong in my analysis?
Fortran is work in progress. I wrote a bunch of text today. Probably won't propose new module.
There are two types of changes to consider for Fortran and ABI.
First, we can and should align the constants with the C ABI, which eliminates a lot of translation overhead. This requires that we have a second version of the headers/modules, with the same name but in a different path. Users or the compiler wrapper script mpifort_abi
will select the path for the ABI module and associated library.
Second, if we want to align the entire Fortran ABI to C, then we must make breaking changes from an API perspective, by changing the type of handles. This is easier in MPI_F08.mod
because it's only MPI_VAL
, although we could consider doing something with MPI.mod
as well, although it would be so intrusive, I'm not sure it's worthwhile.
If you look at https://github.com/mpiwg-abi/mpi-standard/blob/abi/chap-abi/abi.tex, you'll see that I am in the process of describing the first set of changes, but not the second.
If we align the constants to the C ABI, the F2C translation overhead of handles is approximately zero for the common cases. The major overhead that remains will be request translation, and it's possible that this can be optimized in different ways than other objects. Even if it's not, bulk request translation will hit in cache.
@jeffhammond
First, we can ...
If all three Fortran support methods are supported by mpifort-abi, then it sounds really perfect. Remaining question: I expect you must define a fixed value for the special compile-time constants MPI_ASYNC_PROTECTS_NONBLOCKING and ... . Do I expect correctly that you specify them being .TRUE. for the mpi and mpi_f08 module, and .FALSE. for the deprecated mpif.h? Or do you want to provide two ABIs, one for modern Fortran 2018 (or TS29113) compilers with .TRUE. for both modules and another one with .FALSE. for old compilers?
MPI_ASYNC_PROTECTS_NONBLOCKING
and MPI_SUBARRAYS_SUPPORTED
need to become attributes on MPI_COMM_WORLD
to be strictly portable. If we want to be safe, we set them to .false.
, but the better option is to just require them to be true, since we are close enough to having sufficient Fortran 2018 compiler support in the HPC ecosystem to do this.
MPI_SUBARRAYS_SUPPORTED
is always false in mpif.h
and MPI.mod
, is it not? How would subarray support work with <type> BUFFER(*)
? There is no requirement for the compiler to pass a CFI_cdesc_t
there.
Similarly, MPI_ASYNC_PROTECTS_NONBLOCKING
is not relevant to mpif.h
and MPI.mod
because there is no ASYNCHRONOUS
attribute on the buffers in those declarations. However, the text is somewhat ambiguous and seems to allow the possibility of implementations having ASYNCHRONOUS declarations in mpif.h
, although I'm not aware of anyone doing this.
The LOGICAL compile-time constant MPI_SUBARRAYS_SUPPORTED is set to .TRUE. if all buffer choice arguments are defined in explicit interfaces with assumed-type and assumed- rank [47]; otherwise it is set to .FALSE.. The LOGICAL compile-time constant MPI_ASYNC_PROTECTS_NONBLOCKING is set to .TRUE. if the ASYNCHRONOUS attribute was added to the choice buffer arguments of all nonblocking interfaces and the underlying Fortran compiler supports the ASYNCHRONOUS attribute for MPI communication (as part of TS 29113, which has been superceded by Fortran 2018), otherwise it is set to .FALSE.. These constants exist for each Fortran support method, but not in the C header file. The values may be different for each Fortran support method. All other constants and the integer values of handles must be the same for each Fortran support method.
https://github.com/mpiwg-abi/mpi-standard/tree/fortran-attributes adds the above to attributes on world so they can be queried accurately at runtime, regardless of how we decide to deal with the compile-time constants in the ABI.
Problem
The type of the opaque handles -- Comm, Datatype, File, Group, Info, Message, Op, Request, Window -- must be standardized to have an ABI.
MPICH currently uses
int
. Open-MPI uses pointers. There are advantages to both.Unfortunately, because we are required to support equality comparison in C, which lacks operator overloading, we cannot do something elegant like
struct
orunion
(which would allow compilers to do type-checking).Proposal
Option 1
Standardizing
intptr_t
is compatible with both existing designs, although it is an ABI change for MPICH (on 64-bit platforms, at least).intptr_t
would allow Open-MPI to continue to use pointers internally since, by definition,intptr_t
can be cast to/from valid pointers. Since it is also an integer, MPICH can continue to use the tricks it has for encoding special values, and just ignore the additional zeroes.Option 2
Handles that are incomplete struct pointers (ISPs) allow for type checking. Encoding compile-time constants for predefined handles relies on implementation-defined behavior (IB) but should be viable.
The translation from ISP to MPICH handles not as simple as with Option 1. If one implements a compatibility layer, it could maintain a vector of integers and the pointers could pointer to these, assuming casting an ISP to
int*
is always legal. Encoding theint
handles directly in the ISP as a value relies on IB.Changes to the Text
Impact on Implementations
Implementations will have to change, particularly macros that manipulate handles.
The biggest impact is that this breaks the MPICH ABI, but that is a necessary evil in this process.
Impact on Users
Breaking existing MPICH ABI but have standard ABI.
References and Pull Requests
https://github.com/jeffhammond/blog/blob/main/MPI_Needs_ABI_Part_4.md (although this proposes the invalid
union
/struct
design)