Reductions on REAL*16's are not working

ompiteam commented 10 years ago

Per a thread on the user's list:

http://www.open-mpi.org/community/lists/users/2008/10/7081.php

Something odd is going on with REAL*16 reductions. George and I have discovered:

The MPI_REAL16 datatype is mapping itself onto the MPI_LONG_DOUBLE datatype (which is fine)
So its actually invoking ompi_mpi_op_sum_long_double() to do the reduction (instead of ompi_mpi_op_sum_fortran_real16()), but this should be ok because the ompi_fortran_real16_t is just a typedef for long double. So these functions are identical (but it does mean that we have a few useless op routines).
What it ''looks'' like is that the DDT engine is copying over the last element as the first step of the reduction, but then the "+=" operator is not actually adding in the other values together.
Ditto for other operators (e.g., MAX). It's like the operators are not having any effect.

Very puzzling.

ompiteam commented 10 years ago

Imported from trac issue 1603. Created by jsquyres on 2008-10-28T16:16:56, last modified: 2011-06-18T21:12:17

ompiteam commented 10 years ago

Trac comment by jsquyres on 2008-10-28 16:58:20:

Wow, this is a fun one. George and I discovered that icc is representing "long double" in 81 bits, but ifort represents "REAL_16" in all 128 bits. Currently, OMPI's op routines are all implemented in C, and therefore it's just doing the Wrong Thing for Fortran REAL_16.

George and I are discussing a few options for fixing this -- it's debatable as to whether this will get into v1.3.0. It's very unlikely that we'll fix v1.2.

ompiteam commented 10 years ago

Trac comment by bosilca on 2008-10-28 17:37:39:

I went a little bit further. Apparently the REAL16 type supported by the Intel Fortran compiler use a different data representation than long double (both use the standard ''IEEE754'', but the size of the significant is larger and the exponent is not on the same bits). Based on some web resources (such as [http://www.cactuscode.org/pipermail/developers/2006-April/004658.html CactusCode]) it appears that the 128 bits REAL16 is software emulated on most platforms as there is no hardware support for this type. This [of course] incur a significant performance penalty.

I hardly imagine adding such a feature in Open MPI. Yes it give a better precision but it greatly affect performance. Moreover, based on discussions with the math guys from here:

If you go over 64 bits for floating point computations is that your problem is ill conditioned, and you better fix the conditioning and go with a standard and faster type such as long double (i.e. REAL8).

What I propose is to fix this at the '''configure''' level. In Fortran we can use the function precision(TYPE) to figure out the precision of the Fortran data representation. The standard precision for long double is 10. If we get anything bigger than this (as an example ifort report 16 for REAL16), we completely disable the support for REAL16. Easy, fast and can even be back-ported to the 1.2

ompiteam commented 10 years ago

Trac comment by jsquyres on 2008-10-28 18:02:05:

Wouldn't it be better to find out if there's a mismatch between long double and REAL_16, rather than checking against an arbitrary value (10)? If there's a mismatch, disable support for REAL_16.

E.g., we could do some simple MPI_SUM-like operation in a configure test and see if we come up with the right answer. If we don't, disable REAL*16.

ompiteam commented 10 years ago

Trac comment by jsquyres on 2008-10-29 10:22:14:

George and I talked more off-ticket. Some notes:

Intel does have an option for making C and Fortran types the same. Use the "_Quad" type in C and use the following compiler flags: -Qoption,cpp,--extended_float_types.
Pathscale doesn't support REAL*16
GNU doesn't support REAL*16
PGI supports REAL_16, but their C long doube bit representation is also different than their Fortran REAL_16 representation. I didn't look further to see if there's a C compiler type/option to make them the same (like Intel). Assumedly, there is.
There are three options:
1. Disable all support for MPI_REAL16
2. Leave message passing support for MPI_REAL16 but disable reduction support
3. Fix reduction support for MPI_REAL16
It seems like the "fix" option is going to take a while, so we're going to defer it for now. This ticket will stay open to until the issue is resolved one way or another.
The workaround for now seems to be to disable MPI_REAL16 support for reductions, and advise application authors to either use more limited precision (because the performance of REAL16 is going to be bad anyway) or use user-defined Fortran MPI operators to do their own reductions on MPI_REAL16 types.

ompiteam commented 10 years ago

Trac comment by tdd on 2008-10-29 14:46:48:

I've sent Sun's compiler people the question about the Real*16 vs long double support and I got the following reply:

On SPARC they are the same. On x86, we are adding support for REAL_16. Once that is done, the layout will be different. C's long double is actually an 80 bit number, whereas Fortran's REAL_16 is a 128 bit number.

So, With the above it might seem nice to have MPI_REAL16 reductions be configurable.

ompiteam commented 10 years ago

Trac comment by jsquyres on 2008-10-29 14:52:41:

I've made an hg tree to do this work:

http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/real16/

Three things need to be done:

Make MPI_REAL16 and MPI_COMPLEX32 no longer be aliases; they need to be their own types (so that MPI_Op can key off them as a different case than MPI_LONG_DOUBLE)
Add a configure test/option to enable/disable reductions for REAL16 and COMPLEX32
Adjust the MPI_Op predefined operations for REAL16 and COMPLEX32 to be enabled or disabled

George: can you do item 1?

I can do item 2 and item 3.

ompiteam commented 10 years ago

Trac comment by tdd on 2008-10-30 08:38:25:

Upon George's request I've ask our compiler group how 128 bit mathematics are supported and what C-Type corresponds to Real*16.

The answer I got was both SPARC and x86 128 bit mathematics are supported via software. As to the C-Type question for x86 there are none that corresponds to Real_16 but as one can expect (and mentioned in my previous message) on the SPARC platforms long double does correspond to Real_16.

ompiteam commented 10 years ago

Trac comment by jsquyres on 2008-10-30 16:27:34:

HG tree now has configure support and op.c is fixed. George -- the macro you want is OMPI_REAL16_MATCHES_C. It'll always be 0 or 1.

Throwing this ticket to George for completion...

ompiteam commented 10 years ago

Trac comment by bosilca on 2008-11-06 17:19:47:

This is my worst nightmare. Allowing the '''REAL16''' and '''COMPLEX32''' to be used for pt2pt messaging, is still kind of tricky. Having such a datatype, will allow the user to use it for creating others datatype, and use them in file operations or such.

Therefore, adding support for such types everywhere except the op is a difficult topic. There are two other functions that need to be modified: mpi/c/type_create_f90_complex.c and mpi/c/type_create_f90_real.c. And there I don't have any clues what the ''number of digits'' and the ''max_10_exp'' is for such obscure F90 types.

How do we stop the user from using such a type in creating other types ? How do we convert such a type from one architecture to another on a heterogeneous case ?

To be honest this type is so ugly, that the only thing I wish is to completely drop support for these types. The user will still be able to use it, by creating their own type based on a number of MPI_BYTE. With such an approach, no op will be allowed, except if the user define them, no conversion is possible and so on.

ompiteam commented 10 years ago

Trac comment by jsquyres on 2008-11-06 20:29:45:

I talked with George a bit more extensive about this on the phone and I've become convinced that he's right. What we should do for 1.3.0 is only enable MPI_REAL16 support on platforms where long double === REAL*16 (i.e., both size and bit representation), such as SPARC.

If we enable MPI_REAL16 where the C and Fortran bit representations don't match, problems will occur with the following:

Reductions (obviously)
MPI_PACK_EXTERNAL
Heterogeneity, in terms of point-to-point and MPI I/O operations
...and probably some others

George will update the HG tree to do the Right Thing in the DDT engine when sizeof(long double) == sizeof(REAL*16) ''and'' OMPI_REAL16_MATCHES_C is set to 1. Then we'll be good to go for 1.3.0.

In future versions, we can investigate using special C compiler types to match the representation of REAL*16 (like _Quad in the intel compiler).

ompiteam commented 10 years ago

Trac comment by jsquyres on 2008-11-07 15:37:23:

(In [19948]) Refs https://svn.open-mpi.org/trac/ompi/ticket/1603:

Add OMPI_F77_CHECK_REAL16_C_EQUV test whether REAL*16 is bit equivalent to long double. AC_DEFINE OMPI_REAL16_MATCHES_C with result (0 or 1).
Update ompi_info to only show real16 support if OMPI_REAL16_MATCHES_C is 1.
Update DDT to only support REAL16 and COMPLEX32 if 1==OMPI_REAL16_MATCHES_C.
MPI Op function pointer tabls will have NULL for the REAL16 and COMPLEX32 entries if 0==OMPI_REAL16_MATCHES_C.
Slightly cleaned up OMPI_F77_GET_ALIGNMENT and OMPI_F77_CHECK m4 tests (use OMPI_VAR_SCOPE_PUSH/POP).

ompiteam commented 10 years ago

Trac comment by jsquyres on 2008-11-07 15:38:17:

We've done what we were going to do for v1.3.0; more permanent fix will have to come later. Moving to v1.3.1...

ompiteam commented 10 years ago

Trac comment by timattox on 2008-11-08 11:58:01:

(In [19958]) Closes https://svn.open-mpi.org/trac/ompi/ticket/1652: Disable REAL*16 support on unsupported platforms

Submitted by jsquyres, Reviewed by tdd, RM-approved by bosilca

r19948: Refs https://svn.open-mpi.org/trac/ompi/ticket/1603:

Add OMPI_F77_CHECK_REAL16_C_EQUV test whether REAL*16 is bit equivalent to long double. AC_DEFINE OMPI_REAL16_MATCHES_C with result (0 or 1).
Update ompi_info to only show real16 support if OMPI_REAL16_MATCHES_C is 1.
Update DDT to only support REAL16 and COMPLEX32 if 1==OMPI_REAL16_MATCHES_C.
MPI Op function pointer tabls will have NULL for the REAL16 and COMPLEX32 entries if 0==OMPI_REAL16_MATCHES_C.
Slightly cleaned up OMPI_F77_GET_ALIGNMENT and OMPI_F77_CHECK m4 tests (use OMPI_VAR_SCOPE_PUSH/POP).

r19955: Add note about MPI_REAL16 support.

ompiteam commented 10 years ago

Trac comment by jsquyres on 2009-01-17 09:54:58:

I tried to add support this morning in the hg tree for _Quad, but the configure test is ultimately failing because I can't get a Fortran REAL*16 1.1 to equal a C _Quad 1.1.

Reference URL: http://software.intel.com/en-us/forums/intel-c-compiler/topic/56359/

Here's the guts of the test, which you can compile independently:

conftest_f.f

        program bogus
        REAL*16 :: foo, bar
        foo = 1.1
        bar = 1.1
        if (foo .eq. bar) then
              print *, "fortran equal"
        end if
        call c(foo)
        end program bogus

conftest_c.c

#include <stdio.h>
#include <stdlib.h>

void print(const char *name, _Quad *q)
{
    int i;
    char *p = (char*) q;
    printf("%s: ", name);
    for (i = 0; i < sizeof(_Quad); ++i) {
        printf("%4d ", (char) p[i]);
    }
    printf("\n");
}

void c_backend(_Quad *a) {
    _Quad b = 1.1q;
    _Quad c = 1.1;
    print("a", a);
    print("b", &b);
    print("c", &c);
}

void C(_Quad *a) { c_backend(a); }
void c(_Quad *a) { c_backend(a); }
void c_(_Quad *a) { c_backend(a); }

Compile it with:

shell$ ifort conftest_f.f -c && icc conftest_c.c -c -Qoption,cpp,--extended_float_types && ifort conftest_f.o conftest_c.o -o foo

The output I get clearly shows that the bit pattern for 1.1q in C is different than a 1.1 for Fortran:

 fortran equal
a:    0    0    0    0    0    0    0    0    0    0    0 -102 -103   25   -1   63
b: -102 -103 -103 -103 -103 -103 -103 -103 -103 -103 -103 -103 -103   25   -1   63
c:    0    0    0    0    0    0    0  -96 -103 -103 -103 -103 -103   25   -1   63

I know there's things about roundoff error to worry about with floating point comparisons. But really -- if I can assign 1.1 to two different REAL_16's in Fortran and they compare equally, if I can't assign a 1.1q to a C _Quad and compare it to a Fortran REAL_16, that somewhat defeats the point of _Quad, doesn't it?

These tests conducted with RHEL4U4 on x86_64 with icc (ICC) 10.1 20071116.

Does anyone have any ideas here?

ompiteam commented 10 years ago

Trac comment by jsquyres on 2009-01-27 11:37:56:

Let's put this to Future until someone cares about it. :-(

George asked me to set him as "owner" so that he'll see this ticket on his list.

ompiteam commented 10 years ago

Trac comment by jsquyres on 2009-02-10 20:04:59:

(In [20513]) Refs https://svn.open-mpi.org/trac/ompi/ticket/1603

Significantly improve the check to see if REAL*16 === the back-end C type (i.e., not just in size, but also in representation)
Add a check to see if Intel compiler's _Quad type === REAL*16
Ensure that on the Sun SPARC with the Sun compilers, we get long double === REAL*16

ompiteam commented 10 years ago

Trac comment by jsquyres on 2009-06-21 09:17:14:

Per http://www.open-mpi.org/community/lists/devel/2009/06/6290.php, there's new life in this ticket. See hg tree here:

http://bitbucket.org/jsquyres/fortran-real16/

ompiteam commented 10 years ago

Trac comment by stevengj on 2011-06-18 21:12:17:

gcc now (as of 4.6.0) has decent support for a true 16-byte (128-bit) quad-precision type, via __float128 on x86, x86_64, and ia64. This is different from long double on x86 (which is 80-bit extended precision, regardless of storage size). gfortran also supports this via REAL*16.

It would be good to support this type in Open MPI, where it obviously should map to MPI_REAL16.

(Contrary to another poster, there are legitimate uses of such high precision, the most common being to check the roundoff error of double-precision calculations, when debugging numerical accuracy, by comparing to quad-precision. The need is widespread enough to have pushed gcc into supporting it.)

This is in general different from "long double", which on x86 typically maps to 80-bit extended precision, regardless of sizeof(long double), which x86 supports in hardware. __float128, in contrast, is a true quad-precision type and is implemented in software by gcc (~ 100x slower than double-precision calculations in my tests). MPI_REAL16 should definitely not be a synonym for MPI_LONG_DOUBLE except on platforms where "long double" is a genuine quad-precision type.

bosilca commented 5 years ago

Support for reductions on REAL16 is now in, as long as the type has a C matching type..

open-mpi / ompi

Reductions on REAL*16's are not working #63