jeffhammond commented 2 years ago

Problem

I create a standalone implementation of the MPI F08 module to learn some things about writing language interfaces to MPI.

I worked around a lot of problems like hard-coding MPI_Status to the implementation ABI, which is easy enough to do, although would not be necessary with an ABI.

I can mostly deal with compile-time constants, using my own values and doing on-the-fly conversions back to the underlying implementation's C values.

The place where I cannot deal with this is in user-defined reductions. When I have my own reduction, it takes a datatype argument based on the Fortran type definition in my module. When I pass MPI_INTEGER in Fortran, it uses my compile-time constant (-1003), which is converted to the C value (MPICH's in this case is 1275069467 in decimal), because that is what I have to pass to the C call to MPI_Allreduce. The latter is what my user-defined reduction sees, which is not useful since my Fortran has no idea what the MPICH op % MPI_VAL means.

Because MPICH uses integers, I could hack this and do the back-conversion secretly in my F->C layer, only for user-defined reductions with built-in datatypes, although this won't work if MPICH checks the handle for validity. It also doesn't work with Open-MPI.

I can avoid this problem by doing such as reimplementing all the collectives from scratch in Fortran when user-defined reductions are involved, but that's unreasonable. Another workaround would be to generate a user-defined datatype on-the-fly when the user passes a built-in type with a user-defined reduction, which is probably what I'm going to do, but this should not be necessary.

Update: I can't do the on-the-fly type swap trick because then user code won't get the type they passed at the command-line, so if they have logic to check that (which is common), it's going break.

Proposal

This is one of many reasons we should have an ABI. The MPI F08 module was designed to be implementable as a standalone component, but we have failed to make that possible, which means that our ecosystem requires one to compile an entire MPI implementation from scratch just to get a Fortran module, which is an awful user experience because many platforms support multiple Fortran compilers, and building any MPI implementation from source is not quick (https://github.com/open-mpi/ompi/issues/2056).

Changes to the Text

Impact on Implementations

Impact on Users

References and Pull Requests

Wee-Free-Scot commented 2 weeks ago

This part of the ABI goes beyond a pure ABI spec and adds new functions for converting handles.

@jeffhammond We had a question about those handle conversion procedures.

The C part of the ABI defines handles as incomplete struct pointers, for example, (struct MPI_ABI_Comm*), which makes them pointer-sized.

The conversion procedures require serialisation into int, which is not usually pointer-sized.

** Shouldn't these procedures use MPI_Aint instead?

This type is well-defined in Fortran and in C, so it should serve all the purposes of the conversion and it doesn't require 64bit->32bit conversion in either direction.

jeffhammond commented 2 weeks ago

It does not work because of the definition of handles in Fortran, which are INTEGER, either raw or inside of a type.

Wee-Free-Scot commented 2 weeks ago

So, at some point, either inside these new functions or somewhere else, MPI will need to assign a 32-bit integer (possibly by hashing the 64-bit pointer) and recover the 64-bit pointer (possibly by looking up the 32-bit integer in a hash table)?

This is already needed (for implementations that have 64-bit handle types in C), so it is not a new requirement. Is that the line of argument?