float_to_short_test fails on PowerPC running OpenBSD

janstary commented 5 years ago

This is current git on OpenBSD 6.5/macppc (meaning powerpc). Running 'make check' fails with the following:

tests/float_short_test

   float_to_short_test .............................
   Line 64 : out [1] == -1

*** Error 1 in /home/hans/src/libsamplerate (Makefile:1877 'check')

erikd commented 5 years ago

Do you mean floats being wrongly rounded to ints?

Yes, I think so.

Is this a known issue with the PowerPC architecture?

More likely to be a compiler bug,

janstary commented 5 years ago

I don't get it: the compiler is not the one rounding floats to ints. Do you mean the compiler issues wrong instructions for the powerpc arch? Anyway, the error happens with each of gcc 4.2.1, gcc 4.9.4 and clang 7.0.1. Are these known to generate float rounding errors?

erikd commented 5 years ago

If it is happening with more than one compiler, then I think it is unlikely to be the compiler.

At this point I have no idea what it is.

Flamefire commented 5 years ago

Tried to validate this but failed on our ppc system and gcc 4.8.2, gcc 8.2.0. All works fine, tests pass.

Are you using configure or CMake? Can you attach your config.h (after configuring)?

janstary commented 5 years ago

I use configure. After configuring, src/config.h is as follows:

/* src/config.h.  Generated from config.h.in by configure.  */
/* src/config.h.in.  Generated from configure.ac by autoheader.  */

/* Define if building universal (internal helper macro) */
/* #undef AC_APPLE_UNIVERSAL_BUILD */

/* Host processor clips on negative float to int conversion. */
#define CPU_CLIPS_NEGATIVE 1

/* Host processor clips on positive float to int conversion. */
#define CPU_CLIPS_POSITIVE 0

/* Host processor is big endian. */
#define CPU_IS_BIG_ENDIAN 1

/* Host processor is little endian. */
#define CPU_IS_LITTLE_ENDIAN 0

/* Define to 1 if you have the `alarm' function. */
#define HAVE_ALARM 1

/* Set to 1 if you have alsa */
/* #undef HAVE_ALSA */

/* Define to 1 if you have the `calloc' function. */
#define HAVE_CALLOC 1

/* Define to 1 if you have the `ceil' function. */
#define HAVE_CEIL 1

/* Define to 1 if you have the <dlfcn.h> header file. */
#define HAVE_DLFCN_H 1

/* Set to 1 if you have fftw3 */
#define HAVE_FFTW3 1

/* Define to 1 if you have the `floor' function. */
#define HAVE_FLOOR 1

/* Define to 1 if you have the `fmod' function. */
#define HAVE_FMOD 1

/* Define to 1 if you have the `free' function. */
#define HAVE_FREE 1

/* Define to 1 if you have the <inttypes.h> header file. */
#define HAVE_INTTYPES_H 1

/* Define to 1 if you have the `lrint' function. */
#define HAVE_LRINT 1

/* Define to 1 if you have the `lrintf' function. */
#define HAVE_LRINTF 1

/* Define to 1 if you have the `malloc' function. */
#define HAVE_MALLOC 1

/* Define to 1 if you have the `memcpy' function. */
#define HAVE_MEMCPY 1

/* Define to 1 if you have the `memmove' function. */
#define HAVE_MEMMOVE 1

/* Define to 1 if you have the <memory.h> header file. */
#define HAVE_MEMORY_H 1

/* Define if you have signal SIGALRM. */
#define HAVE_SIGALRM 1

/* Define to 1 if you have the `signal' function. */
#define HAVE_SIGNAL 1

/* Set to 1 if you have libsndfile */
#define HAVE_SNDFILE 1

/* Define to 1 if you have the <stdint.h> header file. */
#define HAVE_STDINT_H 1

/* Define to 1 if you have the <stdlib.h> header file. */
#define HAVE_STDLIB_H 1

/* Define to 1 if you have the <strings.h> header file. */
#define HAVE_STRINGS_H 1

/* Define to 1 if you have the <string.h> header file. */
#define HAVE_STRING_H 1

/* Define to 1 if you have the <sys/stat.h> header file. */
#define HAVE_SYS_STAT_H 1

/* Define to 1 if you have the <sys/times.h> header file. */
#define HAVE_SYS_TIMES_H 1

/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H 1

/* Define to 1 if you have the <unistd.h> header file. */
#define HAVE_UNISTD_H 1

/* Define to the sub-directory in which libtool stores uninstalled libraries.
   */
#define LT_OBJDIR ".libs/"

/* Set to 1 if compiling for Win32 */
#define OS_IS_WIN32 0

/* Name of package */
#define PACKAGE "libsamplerate"

/* Define to the address where bug reports for this package should be sent. */
#define PACKAGE_BUGREPORT "erikd@mega-nerd.com"

/* Define to the full name of this package. */
#define PACKAGE_NAME "libsamplerate"

/* Define to the full name and version of this package. */
#define PACKAGE_STRING "libsamplerate 0.1.9"

/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "libsamplerate"

/* Define to the home page for this package. */
#define PACKAGE_URL "http://www.mega-nerd.com/libsamplerate/"

/* Define to the version of this package. */
#define PACKAGE_VERSION "0.1.9"

/* The size of `double', as computed by sizeof. */
#define SIZEOF_DOUBLE 8

/* The size of `float', as computed by sizeof. */
#define SIZEOF_FLOAT 4

/* The size of `int', as computed by sizeof. */
#define SIZEOF_INT 4

/* The size of `long', as computed by sizeof. */
#define SIZEOF_LONG 4

/* Define to 1 if you have the ANSI C header files. */
#define STDC_HEADERS 1

/* Version number of package */
#define VERSION "0.1.9"

/* Define WORDS_BIGENDIAN to 1 if your processor stores words with the most
   significant byte first (like Motorola and SPARC, unlike Intel). */
#if defined AC_APPLE_UNIVERSAL_BUILD
# if defined __BIG_ENDIAN__
#  define WORDS_BIGENDIAN 1
# endif
#else
# ifndef WORDS_BIGENDIAN
/* #  undef WORDS_BIGENDIAN */
# endif
#endif

Do you see any difference to your config?

Flamefire commented 5 years ago

Yes: #define CPU_CLIPS_NEGATIVE 1 is zero for me. The whole clipping code is sketchy anyway and I'd remove it. The C standard defines this "clipping" as undefined behavior so I wouldn't rely on it.

Additionally I found another form of undefined behavior which is described here: https://stackoverflow.com/a/4009922/1930508 https://github.com/erikd/libsamplerate/blob/bf33ede91fac13a4df3e46ccb2422f6361c05512/src/samplerate.c#L513 shifts a (possibly) negative number which is implementation defined:

6.5.7/5 [...] If E1 has a signed type and a negative value, the resulting value is implementation- defined.

Similar problem for e.g fp_to_int and the inverse problem for int_to_fp if the value is big.

@janstary Change the define manually to #define CPU_CLIPS_NEGATIVE 0 after configure and try again.

Wait. I've just seen, that it fails on the positive tests. What I got:

It fails on out[1] = 1
The input isfpos[1] = 0.99
in src_float_to_short_array it is scaled_value = 2126008811,52
hence lrint(scaled_value) = 2126008812 (still positive)
2126008812 >> 16 = 32440 which is a valid short (still positive)
No clipping occurs...

Could you debug this code and verify this to see WHERE the values are different? I'd suggest to add an intermediate int for lrint for debugging.

You may also try this (which is how I'd do it):

void
src_float_to_short_array (const float *in, short *out, int len)
{   double scaled_value ;

    while (len)
    {   len -- ;
        scaled_value = in [len] * 32768 ;
        if (scaled_value >= 32767.f)
            out [len] = 32767 ;
        else if (scaled_value <= -32768.f)
            out [len] = -32768 ;
                else
            out [len] = (short) (lrint (scaled_value)) ;
    } ;

}

janstary commented 5 years ago

With #define CPU_CLIPS_NEGATIVE 0 (and recompiling), the error stays the same.

Anyway, on this machine, this is what lrint() gives:

        32766.000000    32766
        32766.400000    32766
        32766.600000    32767
        32767.000000    32767
        32767.400000    32767
        32767.600000    -32768
        32768.000000    -32768
        32768.400000    -32768
        32768.600000    -32767

Doesn't that mean that #define CPU_CLIPS_NEGATIVE 1 is actually right on this machine?

I added some lame printf()s to the current src_float_to_short_array(), which gives:

in: 1.000000
scaled: 2147483648.000000
overflow

in: 0.990000
scaled: 2126008832.000000
lrint: -512
out: -1

in: 0.950000
scaled: 2040109440.000000
lrint: 2040109440
out: 31129

   Line 64 : out [1] == -1

Firstly, why does the test only fail after the 0.95 input?

Notice that my scaled values are integral (while being double). For example, I see different values on an input of 0.99 then you see: it scales to 2126008832.000000 for me, but scales to 2126008811.52 for you. There is no libm involved in scaled_value = in [len] * (8.0 * 0x10000000), so float arithmetic itself is different on my machine.

However, 2126008832 is still smaller than 2^31, so getting -512 as a lrint() of that cannot be right either.

For reference, here is the dmesg of the machine.

With the modified src_float_to_short_array() you propose, the float_to_short_test passes (it fails later in float_to_int_test, probably same thing), the values being as follows:

in: 1.010000
scaled: 33095.679688
overflow

in: 1.000000
scaled: 32768.000000
overflow

in: 0.990000
scaled: 32440.320312
lrint: 32440
out: 32440

in: 0.950000
scaled: 31129.599609
lrint: 31130
out: 31130

I believe your modification is an improvement at any rate. (But why check for the overflow/underflow of the scaled value when we can already check for in[len] >= 1.0?)

Flamefire commented 5 years ago

With #define CPU_CLIPS_NEGATIVE 0 (and recompiling), the error stays the same.

Ok, expected as this path should not be chosen for this error

Anyway, on this machine, this is what lrint() gives:

Can I see your code? Is it like this:

double f = 32767.6;
long i = lrint(f); // Using long as the type
printf("%f = %ld\n, f, i); // using %ld

If yes, something is very wrong here.lrint is supposed to format into a long which according to SIZEOF_LONG of your output is 32Bit. So lrint(32767.6) == 32768 (or 32767), no clipping should occur. Can you reverify the code, sizeof(long) == 4 and LONG_MAX=2^31-1?

Doesn't that mean that #define CPU_CLIPS_NEGATIVE 1 is actually right on this machine?

No it means either your test code or the compiler/libm implementation is buggy or one of assumptions above does not hold.

Firstly, why does the test only fail after the 0.95 input?

Not sure where you put those printfs but if they are in the loop of src_float_to_short_array the outputs should be reversed. And it does fail on the 2nd output value. Maybe put your code (both the function and the test) into a gist to verify.

Notice that my scaled values are integral (while being double). For example, I see different values on an input of 0.99 then you see:

No that's fine. My values were exact values (not outputs) calculated with higher accuracy.

However, 2126008832 is still smaller than 2^31, so getting -512 as a lrint() of that cannot be right either.

Not at all. The -512 doesn't make any sense at all. Can you post your code and check above sample for correct use of long and %ld?

But why check for the overflow/underflow of the scaled value when we can already check for in[len] >= 1.0?

Floating point is tricky... Valid floats are [-1, 1] (IIRC), which need to be mapped to [-32768, 32767] so I multiply with 32768. Obviously 1. * 32768 > 32767so that is already out of range. So the highest allowed value is 32767/32768 so one could check for that. Just be careful because lrint uses current rounding mode which could be (simplified) round up or down. So you need to ensure that a value just above the (very exact) representation of 32767/32768 cannot occur otherwise the following can happen: floatValue * 32768 = 32767.00000001 which lrint then rounds up to 32768 and boom! I think 32767/32768 is an exact value in float IEEE754 (smallish nominator, power-of-2 denominator) but I don't want to take any chances.

janstary commented 5 years ago

Ah, I was printing the lrint() results as "%d", not "%ld". With "%ld", it's

32766.000000 is 32766
32766.400000 is 32766
32766.600000 is 32767
32767.000000 is 32767
32767.400000 is 32767
32767.600000 is 32768
32768.000000 is 32768
32768.400000 is 32768
32768.600000 is 32769

which seems right.

My values were exact values (not outputs) calculated with higher accuracy.

OK, I was worried even multiplication is wrong here.

However,

2126008832.000000 is -512

with

void
r(double d)
{
        long l;
        l = lrint(d);
        printf("%f is %ld\n", d, l);
}

which makes no sense indeed.

janstary commented 5 years ago

I started a https://github.com/janstary/libsamplerate/tree/float branch that includes (a tweaked version of) your modification. That makes the float-to-short test pass for me (but >= 1.0 is not > 32767/32768 as you say).

janstary commented 5 years ago

Can you please test the following on your PPC machine?

#include <stdio.h>
#include <math.h>

void
r(double d)
{
        long l;
        l = lrint(d);
        printf("%f is %ld\n", d, l);
}

int
main()
{
        r(32766.0);
        r(32766.4);
        r(32766.6);
        r(32767.0);
        r(32767.4);
        r(32767.6);
        r(32768.0);
        r(32768.4);
        r(32768.6);

        r(2147483646.0);
        r(2147483646.4);
        r(2147483646.6);
        r(2147483647.0);
        r(2147483647.4);
        r(2147483647.6);
        r(2147483648.0);
        r(2147483648.4);
        r(2147483648.6);

        return 0;
}

On my OpenBSD ppc.stare.cz 6.5 GENERIC#0 macppc, it says

32766.000000 is 32766
32766.400000 is 32766
32766.600000 is 32767
32767.000000 is 32767
32767.400000 is 32767
32767.600000 is 32768
32768.000000 is 32768
32768.400000 is 32768
32768.600000 is 32769
2147483646.000000 is -2
2147483646.400000 is -2
2147483646.600000 is -1
2147483647.000000 is -1
2147483647.400000 is -1
2147483647.600000 is -2147483648
2147483648.000000 is 2147483647
2147483648.400000 is 2147483647
2147483648.600000 is 2147483647

Flamefire commented 5 years ago

My long type is 64 bit so LONG_MAX is 9223372036854775807 and hence output is:

32766.000000 is 32766
32766.400000 is 32766
32766.600000 is 32767
32767.000000 is 32767
32767.400000 is 32767
32767.600000 is 32768
32768.000000 is 32768
32768.400000 is 32768
32768.600000 is 32769
2147483646.000000 is 2147483646
2147483646.400000 is 2147483646
2147483646.600000 is 2147483647
2147483647.000000 is 2147483647
2147483647.400000 is 2147483647
2147483647.600000 is 2147483648
2147483648.000000 is 2147483648
2147483648.400000 is 2147483648
2147483648.600000 is 2147483649

Can you show your LONG_MAX, INT_MAX and check the used declaration for lrint? Also a good test would be lrint(LONG_MAX) == LONG_MAX and lrint(LONG_MIN) == LONG_MIN. If that fails (and your output shows that it does), file a bug with your distros libm

For this project I'd suggest to use my alternative implementation and remove the clipping code. On all systems I configured this no clipping was supported which is probably due to long being 64bit, so clipping happens much later

janstary commented 5 years ago

Thank you - that seems to be what's broken in my system:

-2147483648.000000 is -2147483648
 2147483647.000000 is -1
-2147483648.000000 is -2147483648
 2147483647.000000 is -1

That's lrint() of INT_MIN, INT_MAX, LONG_MIN, LONG_MAX, using the code above.

Flamefire commented 5 years ago

Amazing bug :+1: Out of curiosity can you try 1073741822 to 1073741824 (2^30 and one above/below) If I'm right then the bug should happen in the range [2^30, 2^31)

Proposed fix/workaround for SRC: https://github.com/erikd/libsamplerate/issues/65#issuecomment-516058420 @erikd

janstary commented 5 years ago

The lrint() and lround() of these:

1073741822.000000 is -2 is -2
1073741824.000000 is 1073741824 is 1073741824

libsndfile / libsamplerate

float_to_short_test fails on PowerPC running OpenBSD #65