Open jeffhammond opened 1 year ago
For info reminds me of one place where the standard mentions what is for me "calling" convention (in fact seem to call it linkage), we may look at it too:
19.1.17 Problems with Code Movement and Register Optimization Nonblocking Operations If a variable is local to a Fortran subroutine (i.e., not in a module or a COMMON block), the compiler will assume that it cannot be modified by a called subroutine unless it is an actual argument of the call. In the most common linkage convention, the subroutine is expected to save and restore certain registers. Thus, the optimizer will assume that a register which held a valid copy of such a variable before the call will still hold a valid copy on return.
This same section also explains why all the _begin
& _end
are in MPI-IO, passing a buffer argument. I do not think it is still the case today -- Fortran is amazing. I feel it is something that should go away with 77, looks like leftovers to me. What is interesting is that to my understanding, these functions were created as mitigation for the calling convention:
This register optimization/code movement problem for nonblocking operations does not occur with MPI parallel file I/O split collective operations, because in the MPI_XXX_BEGIN and MPI_XXX_END calls, the same buffer has to be provided as an actual argument.
Fortran ASYNCHRONOUS
solves this, but it only does what we need in Fortran 2018 (although I know of no compiler that doesn't do what we need in practice as of Fortran 2008).
We are working deprecating mpif.h
now. Sadly, I don't think we will ever get rid of mpi.mod
.
My goal is to leverage the C ABI to write a very nice set of new Fortran bindings that are free from the burden of standardization and can fix all sorts of things like this.
This is excellent 🔥 . Yes standardizing two languages (in fact looking at Fortran itself, it is more than 1) is a great source of complexity...
One of the goals in the F08 bindings was to make it possible to write them in a (mostly) implementation-agnostic way on top of the C bindings, but that didn't happen, although the situation is a lot better than with F90.
The standalone F08 experiment has been quite useful in identifying ABI issues...
Do we want to consider symbol names/visibility as part of calling conventions or break it into its own separate issue?
Please create a separate issue so we know to address it. However, we might be able to solve both at the same time.
We should merely state that the MPI library must support the calling convention of the system C compiler on the platform.
I think that we should specify the ABI as a C header, say that "it is C", and that's it.
Every platform already has a C ABI specification that specifies how to interface with C on that platform. This covers way more than just the calling convention.
Its not the job of the MPI spec to specify how C (or Fortran) programs interface with each other on particular platforms. It is the platform job to do that, because there are many C and Fortran programs that are not MPI programs and need to interface with each other anyways.
We should merely state that the MPI library must support the calling convention of the system C compiler on the platform.
I think that we should specify the ABI as a C header, say that "it is C", and that's it.
Yes, this is my intent, although I want to add "...as if compiled with the system default C compiler and runtime library," since that addresses the situation with glibc not being the only C RTL on Linux. Alpine uses MUSL.
I don't know to what extent https://wiki.musl-libc.org/functional-differences-from-glibc.html will impact MPI implementations, but we need to be cautious about assuming too much.
Every platform already has a C ABI specification that specifies how to interface with C on that platform. This covers way more than just the calling convention.
Windows apparently does not have a default calling convention. MPI does not assume any operating system and has been designed through its history to support a wide range of operating systems, including ones that are quite strange.
Yes, this is my intent, although I want to add "...as if compiled with the system default C compiler and runtime library," since that addresses the situation with glibc not being the only C RTL on Linux. Alpine uses MUSL.
Both MUSL and glibc follow the same platform ABI (e.g. x86_64 psABI on x86_64 Linux) and one can have a binary that uses MUSL and calls into a library that uses glibc, for example, passing values like an int
back and forth without issues: both agree on the layout and calling convention of an int
.
However, since these are two different C standard libraries, what one cannot do is, e.g., allocate memory with malloc
on the MUSL side, and free it with free
on the glibc side, since these are two separate allocators, and while both MUSL and glibc provide a pthreads mutex, these mutexes have different ABIs since the platform does not specify an ABI for them, so one can't try to share a mutex across both parts of a binary, etc.
Is there an MPI API for which the ABI specified by the platforms do not suffice? For example, an MPI API where the MPI user passes the library a pointer to a mutex, that the application obtained throughout a non-MPI API, or an API where the application passes the MPI API a pointer to memory that the application allocated but MPI is expected to free, or vice-versa?
If not, and the application and the MPI API only pass values specified by the platforms ABIs, then saying more than "it is C" would probably not be necessary.
Windows apparently does not have a default calling convention.
Windows have multiple default calling conventions for C, depending on the Windows target used (e.g. x86_64-pc-windows-msvc vs x86_64-pc-windows-mingw). These targets do not interoperate with each other and are therefore in practice treated as different platforms, but each of these targets has a stable C ABI that allows all C software on that target to interoperate.
Yes, this is my intent, although I want to add "...as if compiled with the system default C compiler and runtime library," since that addresses the situation with glibc not being the only C RTL on Linux. Alpine uses MUSL.
Both MUSL and glibc follow the same platform ABI (e.g. x86_64 psABI on x86_64 Linux) and one can have a binary that uses MUSL and calls into a library that uses glibc, for example, passing values like an
int
back and forth without issues: both agree on the layout and calling convention of anint
.However, since these are two different C standard libraries, what one cannot do is, e.g., allocate memory with
malloc
on the MUSL side, and free it withfree
on the glibc side, since these are two separate allocators, and while both MUSL and glibc provide a pthreads mutex, these mutexes have different ABIs since the platform does not specify an ABI for them, so one can't try to share a mutex across both parts of a binary, etc.Is there an MPI API for which the ABI specified by the platforms do not suffice? For example, an MPI API where the MPI user passes the library a pointer to a mutex, that the application obtained throughout a non-MPI API, or an API where the application passes the MPI API a pointer to memory that the application allocated but MPI is expected to free, or vice-versa?
I don't think there is. MPI is very conservative about what it assumes from the system. Technically, we don't assume all processes can do language-standard I/O (see §9.1.2), and, if nothing else, our Fortran support does not assume C memory management exists.
If not, and the application and the MPI API only pass values specified by the platforms ABIs, then saying more than "it is C" would probably not be necessary.
You are probably right, but we should discuss this in detail. I am hoping that our friends from Red Hat and Canonical can provide some expert guidance on platform ABI assumes.
Problem
ABI includes calling conventions, which are not standardized.
Proposal
We should not go too deep on this. We should merely state that the MPI library must support the calling convention of the system C compiler on the platform. This is going to be trivial in most cases, and is already widely accepted, since it is necessary in many contexts.
Changes to the Text
Describe the issue, and mention that at least one major architecture (x86) has multiple conventions:
We can cite the ARM calling conventions that are standardized:
Impact on Implementations
None. There is no change to existing practice here.
Impact on Users
None, unless they are using C toolchains that are not compatible with the one used to build MPI.
References and Pull Requests
https://en.wikipedia.org/wiki/X86_calling_conventions