Open jdinan opened 6 years ago
Comment from discussion on 9/20/2018: This also raises a question about interoperability between MPI and C library routines that operate on C standard types (e.g. printf
, scanf
, etc.). Being able to specify format specifiers indicates that there is a correspondence between the MPI type and a C standard type.
@mahermanns and @dholmes-epcc-ed-ac-uk Thanks for volunteering to further discussion on this ticket.
That is the right question (Bill, Sept 2018)
Should MPI replace MPI_COUNT with size_t in all C API definitions, and with whatever native Fortran datatype is natural for the intended usage in each situation in all Fortran API definitions?
Should MPI replace MPI_AINT with ptrdiff_t in all C API definitions, and with whatever native Fortran datatype is natural for the intended usage in each situation (which may not exist in all versions of Fortran!) in all Fortran API definitions?
The consequences of this counter-proposal are that no such format specifiers are needed, and the arithmetic operators MPI_AINT_ADD and MPI_AINT_DIFF are no longer needed, and the Big MPI proposal is no longer needed (as currently specified), and \
MPI_Aint to ptrdiff_t would be more accurate. But otherwise +1.
Thanks @bosilca - I knew that such a type must exist but could not think of the type name at the time I wrote the comment.
@jdinan Is it really that hard? We know from MPI-3.1 Section 2.5.8 that MPI_Count
must be signed, because
it must be minimally 16 capable of encoding any value that may be stored in a variable of type
int
so one should only need to verify that off_t
and ptrdiff_t
are the same size and then use %zd
or PRI64d
.
In any case, I fail to see any utility in truncating words in MPI_PRI_COUNT
and MPI_SCN_COUNT
. Just use MPI_PRINT_COUNT
and MPI_SCAN_COUNT
. The result is significantly more readable and adds only 3 bytes to the size of mpi.h
.
@jeffhammond given that these format specifiers only apply to the printf and scanf functions (with variants, such as vsprintf?) then we should probably include that extra F to make it 20% clearer: MPI_PRINTF_COUNT MPI_SCANF_COUNT
Dumb question: will these ever be different to each other? Do we need two/both?
What is the Fortran equivalent? The "I" descriptor seems old, i.e. F77 era.
@dholmes-epcc-ed-ac-uk Fortran does not standardize a preprocessor so it doesn't really matter.
These should follow the convention used in inttypes.h
for print and scan format specifiers. These can be used in any of the functions in the printf and scanf family (see the link above for info on the inttypes header).
@jeffhammond Yes, it really is this hard if you want portability. In C, the standard integer type binary format is implementation defined, but the fixed width integer types must be two's complement. It is therefore possible to have two different signed integer representations and a user will not know which one should be used with MPI_Count
.
We can't use C size_t
and ptrdiff_t
because of heterogeneity support and language interoperability.
@jdinan This should be fixed in C20/C++20.
We could also just preemptively stipulate that the MPI standard requires two's complement integers because there are literally no system outside of Unisys supports anything else and then only in the context of FPGA emulation of legacy code that can't be migrated to x86_64 (see aforementioned documents for details).
@jeffhammond FYI if you want the latest version of a paper, use the wg21.link/p0907
link; it automatically resolves to the most recent submitted version. P0907 is on R3 now. Also it's been forwarded to Core, but I'm not sure of current status for C++20.
I think using size_t and ptrdiff_t in the API is a different discussion.
I think as MPI introduces the typedef, it should also be MPI defining the format specifier (apart from how difficult it is or whether it is possible at all).
Using the PRI abbreviation would follow the principle of least astonishment. However, as we are diverting from the original naming anyway (with the second underscore and all uppercase), it may indeed be better to expand the names to MPI_PRINT_COUNT
and MPI_SCAN_COUNT
(I am also not a friend of abbreviating variable names unnecessarily). Then again, naming them MPI_PRI_COUNT
and MPI_SCN_COUNT
may set them apart enough from other MPI constants to foster intuitive recognition.
@jdinan has this problem gone away? (I know that the answer has to be "no" because no changes have been made to address it, but no-one has commented on this issue since 2018 so it obviously particularly pressing.)
Is there still interest in doing something about this for the mpi-4.0 release? If so, the clock is ticking rapidly.
@dholmes-epcc-ed-ac-uk No, this hasn't been fixed. This issue could be a good first proposal for any Forum members that are looking to get their feet wet introducing a new proposal to the MPI Forum.
Just as reference, MPICH has provided these (in mpi.h) for some time.
/* FIXME: The following two definition are not defined by MPI and must not be
included in the mpi.h file, as the MPI namespace is reserved to the MPI
standard */
#define MPI_AINT_FMT_DEC_SPEC "%ld"
#define MPI_AINT_FMT_HEX_SPEC "%lx"
Just as reference, MPICH has provided these (in mpi.h) for some time.
/* FIXME: The following two definition are not defined by MPI and must not be included in the mpi.h file, as the MPI namespace is reserved to the MPI standard */ #define MPI_AINT_FMT_DEC_SPEC "%ld" #define MPI_AINT_FMT_HEX_SPEC "%lx"
Note the actual specifiers are filled in by configure
.
I’m going to propose moving this to MPI 5.0. There’s more discussion to be had here. If someone objects and thinks we’ll be ready to read this soon, leave a comment and we can discuss bringing it back into MPI 4.1.
To folks that have asked, no the problem has not gone away since users don't know to which integral C type a given MPI integer type maps. If another Forum member has cycles to pick up this issue (should be a relatively easy one), please feel free to do so.
Can one just default to %llu and promote it, if it's not 64b?
If you used the same approach with scanf, it would be difficult to detect whether the value is truncated.
One could also determine the size of an integer using sizeof
and whether it is signed using this.
With C++, it seems straightforward to deduce the printf
formats from typeid
. See below.
One can write something similar with the GNU C extension typeof
, which is expected to be in C23. I assume there is a way to do it with _Generic
as well, but I haven't tried.
As for scanf
, I would expect binary file I/O to need to store the type information if one cannot assume 64-bit values.
#include <typeinfo>
#include <iostream>
#include <string>
#include <mpi.h>
int main(void)
{
MPI_Count c = 5;
MPI_Aint a = 6;
MPI_Offset o = 7;
std::string ff{"C=%"+std::string{typeid(MPI_Count).name()}+"\n"};
printf(ff.c_str(),c);
std::string gg{"A=%"+std::string{typeid(MPI_Aint).name()}+( std::is_signed<MPI_Aint>() ? "d" : "u")+"\n"};
printf(gg.c_str(),a);
std::string hh{"O=%"+std::string{typeid(MPI_Offset).name()}+"\n"};
printf(hh.c_str(),o);
return 0;
}
@jeffhammond wrote:
With C++, it seems straightforward to deduce the printf formats from typeid. See below.
In C++23, I would use std::print
, and in C++20, I would use std::format
. These Solve the Problem without you needing to know the printf format specifier. If you need to support earlier C++ versions, you can use the {fmt} library. (C++20 and C++23 standardized these parts of the {fmt} library.)
If I had to use printf
, Jeff's typeid
-based approach works, but please note that the result of std::type_info::name()
is mangled and not standard. GCC offers a demangling function ( https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html ); other compilers probably also do that.
Can one just default to
%llu
and promote it, if it's not 64b?
If it's actually a pointer, reinterpret_cast<ptrdiff_t>(p)
would get you a signed integer, in which case I would use t
instead of ll
.
Please don't use intmax_t
(see e.g., https://thephd.dev/intmax_t-hell-c++-c ).
We might end up standardizing the C type of these types for the ABI, so maybe it won't be so bad in the future.
I withdraw my prior objections to this proposal. We should do this, and it is especially important for MPI_Count
, because it is likely going to be the wider of intptr_t
and int64_t
and thus it's going to be annoying for users to printf
these.
Problem
It's not easy to perform I/O on
MPI_Count
, e.g. withprintf
orscanf
.Proposal
Similar to
inttypes.h
, addMPI_PRI_COUNT
andMPI_SCN_COUNT
format specifiers tompi.h
.Changes to the Text
TBD
Impact on Implementations
Should be limited to header files.
Impact on Users
Users don't need to figure out the format specifier based on the size and signedness of the type.
References