jdinan commented 6 years ago

Problem

It's not easy to perform I/O on MPI_Count, e.g. with printf or scanf.

Proposal

Similar to inttypes.h, add MPI_PRI_COUNT and MPI_SCN_COUNT format specifiers to mpi.h.

Changes to the Text

TBD

Impact on Implementations

Should be limited to header files.

Impact on Users

Users don't need to figure out the format specifier based on the size and signedness of the type.

References

inttypes.h

jdinan commented 6 years ago

Comment from discussion on 9/20/2018: This also raises a question about interoperability between MPI and C library routines that operate on C standard types (e.g. printf, scanf, etc.). Being able to specify format specifiers indicates that there is a correspondence between the MPI type and a C standard type.

jdinan commented 6 years ago

@mahermanns and @dholmes-epcc-ed-ac-uk Thanks for volunteering to further discussion on this ticket.

dholmes-epcc-ed-ac-uk commented 6 years ago

That is the right question (Bill, Sept 2018)

Should MPI replace MPI_COUNT with size_t in all C API definitions, and with whatever native Fortran datatype is natural for the intended usage in each situation in all Fortran API definitions?

Should MPI replace MPI_AINT with ptrdiff_t in all C API definitions, and with whatever native Fortran datatype is natural for the intended usage in each situation (which may not exist in all versions of Fortran!) in all Fortran API definitions?

The consequences of this counter-proposal are that no such format specifiers are needed, and the arithmetic operators MPI_AINT_ADD and MPI_AINT_DIFF are no longer needed, and the Big MPI proposal is no longer needed (as currently specified), and \.

bosilca commented 6 years ago

MPI_Aint to ptrdiff_t would be more accurate. But otherwise +1.

dholmes-epcc-ed-ac-uk commented 6 years ago

Thanks @bosilca - I knew that such a type must exist but could not think of the type name at the time I wrote the comment.

jeffhammond commented 6 years ago

@jdinan Is it really that hard? We know from MPI-3.1 Section 2.5.8 that MPI_Count must be signed, because

it must be minimally 16 capable of encoding any value that may be stored in a variable of type int

so one should only need to verify that off_t and ptrdiff_t are the same size and then use %zd or PRI64d.

In any case, I fail to see any utility in truncating words in MPI_PRI_COUNT and MPI_SCN_COUNT. Just use MPI_PRINT_COUNT and MPI_SCAN_COUNT. The result is significantly more readable and adds only 3 bytes to the size of mpi.h.

dholmes-epcc-ed-ac-uk commented 6 years ago

@jeffhammond given that these format specifiers only apply to the printf and scanf functions (with variants, such as vsprintf?) then we should probably include that extra F to make it 20% clearer: MPI_PRINTF_COUNT MPI_SCANF_COUNT

Dumb question: will these ever be different to each other? Do we need two/both?

What is the Fortran equivalent? The "I" descriptor seems old, i.e. F77 era.

jeffhammond commented 6 years ago

@dholmes-epcc-ed-ac-uk Fortran does not standardize a preprocessor so it doesn't really matter.

jdinan commented 6 years ago

These should follow the convention used in inttypes.h for print and scan format specifiers. These can be used in any of the functions in the printf and scanf family (see the link above for info on the inttypes header).

@jeffhammond Yes, it really is this hard if you want portability. In C, the standard integer type binary format is implementation defined, but the fixed width integer types must be two's complement. It is therefore possible to have two different signed integer representations and a user will not know which one should be used with MPI_Count.

jdinan commented 6 years ago

We can't use C size_t and ptrdiff_t because of heterogeneity support and language interoperability.

jeffhammond commented 6 years ago

@jdinan This should be fixed in C20/C++20.

We could also just preemptively stipulate that the MPI standard requires two's complement integers because there are literally no system outside of Unisys supports anything else and then only in the context of FPGA emulation of legacy code that can't be migrated to x86_64 (see aforementioned documents for details).

mhoemmen commented 6 years ago

@jeffhammond FYI if you want the latest version of a paper, use the wg21.link/p0907 link; it automatically resolves to the most recent submitted version. P0907 is on R3 now. Also it's been forwarded to Core, but I'm not sure of current status for C++20.

mahermanns commented 6 years ago

I think using size_t and ptrdiff_t in the API is a different discussion.

I think as MPI introduces the typedef, it should also be MPI defining the format specifier (apart from how difficult it is or whether it is possible at all).

Using the PRI abbreviation would follow the principle of least astonishment. However, as we are diverting from the original naming anyway (with the second underscore and all uppercase), it may indeed be better to expand the names to MPI_PRINT_COUNT and MPI_SCAN_COUNT (I am also not a friend of abbreviating variable names unnecessarily). Then again, naming them MPI_PRI_COUNT and MPI_SCN_COUNT may set them apart enough from other MPI constants to foster intuitive recognition.

dholmes-epcc-ed-ac-uk commented 4 years ago

@jdinan has this problem gone away? (I know that the answer has to be "no" because no changes have been made to address it, but no-one has commented on this issue since 2018 so it obviously particularly pressing.)

Is there still interest in doing something about this for the mpi-4.0 release? If so, the clock is ticking rapidly.

jdinan commented 3 years ago

@dholmes-epcc-ed-ac-uk No, this hasn't been fixed. This issue could be a good first proposal for any Forum members that are looking to get their feet wet introducing a new proposal to the MPI Forum.

raffenet commented 3 years ago

Just as reference, MPICH has provided these (in mpi.h) for some time.

/* FIXME: The following two definition are not defined by MPI and must not be
   included in the mpi.h file, as the MPI namespace is reserved to the MPI
   standard */
#define MPI_AINT_FMT_DEC_SPEC "%ld"
#define MPI_AINT_FMT_HEX_SPEC "%lx"

raffenet commented 3 years ago

Just as reference, MPICH has provided these (in mpi.h) for some time.

/* FIXME: The following two definition are not defined by MPI and must not be
   included in the mpi.h file, as the MPI namespace is reserved to the MPI
   standard */
#define MPI_AINT_FMT_DEC_SPEC "%ld"
#define MPI_AINT_FMT_HEX_SPEC "%lx"

Note the actual specifiers are filled in by configure.

wesbland commented 2 years ago

I’m going to propose moving this to MPI 5.0. There’s more discussion to be had here. If someone objects and thinks we’ll be ready to read this soon, leave a comment and we can discuss bringing it back into MPI 4.1.

jdinan commented 2 years ago

To folks that have asked, no the problem has not gone away since users don't know to which integral C type a given MPI integer type maps. If another Forum member has cycles to pick up this issue (should be a relatively easy one), please feel free to do so.

jeffhammond commented 2 years ago

Can one just default to %llu and promote it, if it's not 64b?

jdinan commented 2 years ago

If you used the same approach with scanf, it would be difficult to detect whether the value is truncated.

jeffhammond commented 1 year ago

One could also determine the size of an integer using sizeof and whether it is signed using this.

With C++, it seems straightforward to deduce the printf formats from typeid. See below.

One can write something similar with the GNU C extension typeof, which is expected to be in C23. I assume there is a way to do it with _Generic as well, but I haven't tried.

As for scanf, I would expect binary file I/O to need to store the type information if one cannot assume 64-bit values.

#include <typeinfo>
#include <iostream>
#include <string>

#include <mpi.h>

int main(void)
{
    MPI_Count  c = 5;
    MPI_Aint   a = 6;
    MPI_Offset o = 7;

    std::string ff{"C=%"+std::string{typeid(MPI_Count).name()}+"\n"};
    printf(ff.c_str(),c);

    std::string gg{"A=%"+std::string{typeid(MPI_Aint).name()}+( std::is_signed<MPI_Aint>() ? "d" : "u")+"\n"};
    printf(gg.c_str(),a);

    std::string hh{"O=%"+std::string{typeid(MPI_Offset).name()}+"\n"};
    printf(hh.c_str(),o);

    return 0;
}

mhoemmen commented 1 year ago

@jeffhammond wrote:

With C++, it seems straightforward to deduce the printf formats from typeid. See below.

In C++23, I would use std::print, and in C++20, I would use std::format. These Solve the Problem without you needing to know the printf format specifier. If you need to support earlier C++ versions, you can use the {fmt} library. (C++20 and C++23 standardized these parts of the {fmt} library.)

If I had to use printf, Jeff's typeid-based approach works, but please note that the result of std::type_info::name() is mangled and not standard. GCC offers a demangling function ( https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html ); other compilers probably also do that.

Can one just default to %llu and promote it, if it's not 64b?

If it's actually a pointer, reinterpret_cast<ptrdiff_t>(p) would get you a signed integer, in which case I would use t instead of ll.

Please don't use intmax_t (see e.g., https://thephd.dev/intmax_t-hell-c++-c ).

jeffhammond commented 1 year ago

We might end up standardizing the C type of these types for the ABI, so maybe it won't be so bad in the future.

jeffhammond commented 1 year ago

I withdraw my prior objections to this proposal. We should do this, and it is especially important for MPI_Count, because it is likely going to be the wider of intptr_t and int64_t and thus it's going to be annoying for users to printf these.

mpi-forum / mpi-issues

Format specifiers for MPI types #107

Problem

Proposal

Changes to the Text

Impact on Implementations

Impact on Users

References