test-ringarith and test-ffarith fail when compiled with clang and -O2

ClementPernet commented 5 years ago

Since #109 was merged, the testsuite is now compiled with -O2. This leads to 2 failures in the testsuite when compiling with clang: test-ringarith and test-ffarith

jgdumas commented 5 years ago

modular-integral implementation of add and addin are

r = a + b;
return (r>p || r<a ? r -= p : r);

Now, when Element is int32_t and a and b are close to 2^31-1, a+b overflows, thus causing r to be r - 2^32, which is catched by the test.

Now if (r-2^32)-p underflows, then return gives (correctly !!!) 2^32+(r-2^32)-p=r-p as expected.

It seems that clang++ -O2 sometimes fails to catch the overflow and leaves r to e.g. a negative value.

This hard to see. Indeed when adding some traces, clang++ gives again the correct value and a mwe does not reproduce it ...

jgdumas commented 5 years ago

The problem shows in test-ringarith.C for F.addin and sometimes for F.add when used with ModularZULL = Modular<int32_t, uint64_t> and modulus ZULLmax, i.e., 2^(N-1) - 1, here 2^31-1=2147483647

cyrilbouvier commented 5 years ago

I found a MWE (see below) that always fails with clang++-6.0 with -O2 -Wall -g -DNDEBUG -UGIVARO_DEBUG -UDEBUG and works with clang++-6.0 with -O0 -Wall -g -UNDEBUG -DGIVARO_DEBUG -DDEBUG and g++ (8.3.0) in both cases.

Edit: actually, g++ has the same behavior as clang++ and fails with -O2 and succeeds with -O0

The example shows that the problem is not only in addin but in the sequence axmy, addin.

#include <iostream>
#include "givaro/modular.h"

using namespace Givaro;

int
main (int argc, char *argv[])
{
    using Ring = Modular<int32_t, uint64_t>;

    Ring R(Ring::maxCardinality());
    std::cout << "Modular<int32_t, uint64_t>" << " with m=" << R.cardinality()
                                              << std::endl;

    typename Ring::Element x, y, z, r1, r2;
    R.init (x, 7);
    R.init (y, 2147483618);
    R.init (z, 2147483625);

    std::cout << "x = " << x << std::endl;
    std::cout << "y = " << y << std::endl;
    std::cout << "z = " << z << std::endl;

    R.init (r1);
    R.init (r2);

    R.init (r1, 2147483466); /* r1 = x*y - z */
    R.addin (r1, z);

    R.axmy (r2, x, y, z); /* r2 = x*y - z */
    std::cout << "axmy = " << r2 << " # = x*y-z" << std::endl;
    R.addin (r2, z); // r2 += z

    std::cout << "r1 = " << r1 << std::endl;
    std::cout << "r2 = " << r2 << std::endl;

    if (!R.areEqual (r1, r2))
    {
        std::cout << "failure" << std::endl;
        return 1;
    }
    else
    {
        std::cout << "ok" << std::endl;
        return 0;
    }
}

/* -*- mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
// vim:sts=4:sw=4:ts=4:et:sr:cino=>s,f0,{0,g0,(0,\:0,t0,+0,=s

jgdumas commented 5 years ago

Well I tried to make a MWE without givaro, by extracting the add code directly with int32_t etc., and there the difference disappears. The difference disappears also if I add some printing traces within Givaro ...

cyrilbouvier commented 5 years ago

I tried to find a smaller example without calls to Givaro's functions. Here what I have for now:

#include <iostream>
#include "givaro/modular.h"

using namespace Givaro;

#define common(r) static_cast<int32_t>((static_cast<uint64_t>(a)*static_cast<uint64_t>(b) + m-Caster<uint64_t>(c)) % m); \
    r += c;  \
    r = (r >= Caster<int32_t>(mp) || r < c) ? r - static_cast<int32_t>(mp) : r;

static inline void
foo (int32_t &r, int32_t a, int32_t b, int32_t c, uint64_t m, uint32_t mp)
{
    r = common(r);
}

int
main (int argc, char *argv[])
{
    int32_t a = 7;
    int32_t b = 2147483618;
    int32_t c = 2147483625;
    uint64_t m = 2147483647;
    uint32_t mp = 2147483647;

    std::cout << "a = " << a << std::endl;
    std::cout << "b = " << b << std::endl;
    std::cout << "c = " << c << std::endl;

    /* r1 */
    int32_t r1 = common(r1);
    std::cout << "r1 = " << r1 << std::endl;

    /* r2 */
    int32_t r2;
    foo (r2, a, b, c, m, mp);
    std::cout << "r2 = " << r2 << std::endl;

    if (r1 != r2)
    {
        std::cout << "failure" << std::endl;
        return 1;
    }
    else
    {
        std::cout << "ok" << std::endl;
        return 0;
    }
}

r1 and r2 are computed with the same code (from the macro common), with the difference that r1 is computed directly and r2 via a function call (static inline function).

with clang++-6.0 -O2 -Wall -g -o bug bug.C: r1 is correct but not r2 (r2 = -205)
with clang++-6.0 -O0 -Wall -g -o bug bug.C: r1 and r2 are correct

This is weird, but it gets wierder (in the -O2 case):

If I replace only the first Caster with a static_cast, r1 and r2 are both correct
If I replace only the second Caster with a static_cast, r1 and r2 are both wrong (equal to -205)

[ with -O0, this is always correct]

cyrilbouvier commented 5 years ago

After hours trying to understand this bug, I suspected a bug in the optimisation step of clang++-6.0

If I compile the code of my previous message without any printing and that I use objdump to analyse the produced code, I obtain:

for clang++-6.0, assembly code that do only return 1. Assembly code is obtained with

clang++-6.0 -O2 -Wall -c -o bug.clang.O2.o bug.C
objdump -drwC -S bug.clang.O2.o > bug.clang.O2.objdump

for g++, assembly code that do only return 0.

g++ -O2 -Wall -c -o bug.g++.O2.o bug.C
objdump -drwC -S bug.g++.O2.o > bug.g++.O2.objdump

In both case, the compiler optimises everything out as every operand are known at compile time. But clang++-6.0 and g++ does not agree in the output.

P1K commented 5 years ago

Hi Cyril,

If I'm not wrong, over/underflows with signed int is undefined behaviour, so the compilers would have the "right" to optimise a test to a constant if it detects the overflow. I didn't look at this bug in detail, but I have the impression that this is what's happening here? If that's indeed the problem, a solution would be to cast to unsigned to do the overflows, where in this case the behaviour is well defined (as we did here #98)?

P1K commented 5 years ago

(For instance, redefining common as following does seem to resolve the issue with clang:

#define common(r) static_cast<int32_t>((static_cast<uint64_t>(a)*static_cast<uint64_t>(b) + m-Caster<uint64_t>(c)) % m); \
    uint32_t rr,cc; \
    rr = static_cast<uint32_t>(r); \
    cc = static_cast<uint32_t>(c);  \
    rr += cc;  \
    r = static_cast<int32_t>(rr); \
    r = (r >= Caster<int32_t>(mp) || r < c) ? r - static_cast<int32_t>(mp) : r;

)

cyrilbouvier commented 5 years ago

@P1K I think your assessment is correct: as over/underflows with signed int is undefined behaviour, it is a ill-formed program and compilers can optimise it as they want.

So I started reading section 4 of the C++11 norm (on standard conversions) and I am not sure that r = static_cast<int32_t>(rr); is correct:

static_cast performs an "implicit conversion sequence" (according to cppreference)
section 4.7.3 of the C++ norm (on implicit conversion for base types): "if the destination type is signed, the value does not change if the source integer can be represented in the destination type. Otherwise the result is implementation-defined"

So I think all the computations should be done with unsigned type, with one conversion to signed at the end. The question now is how to do this in givaro. We have three available type:

Element: can be signed or unsigned
Compute: always unsigned but can be twice as big as Element so maybe not the best choice for performance reason
Residue: always unsigned and the same size as Element, but was not intented to perform computation, only to hold return values of some function.

P1K commented 5 years ago

Okay. I don't know if static_cast is the way to go, and indeed making all computations on unsigned types would seem better. But I have no idea about how to do that properly in givaro.

P1K commented 5 years ago

(For instance, redefining common as following does seem to resolve the issue with clang:

#define common(r) static_cast<int32_t>((static_cast<uint64_t>(a)*static_cast<uint64_t>(b) + m-Caster<uint64_t>(c)) % m); \
    uint32_t rr,cc; \
    rr = static_cast<uint32_t>(r); \
    cc = static_cast<uint32_t>(c);  \
    rr += cc;  \
    r = static_cast<int32_t>(rr); \
    r = (r >= Caster<int32_t>(mp) || r < c) ? r - static_cast<int32_t>(mp) : r;

)

(Note that a cast/unsigned computation should probably be done for r - static_cast<int32_t>(mp) too.)

jgdumas commented 5 years ago

Branch clangbug has a first draft of a fix for this. Need to be simplified though.

jgdumas commented 5 years ago

solved in PR #111

linbox-team / givaro

test-ringarith and test-ffarith fail when compiled with clang and -O2 #110