Strange warnings from numpy/matplotlib when sage is built with clang

kiwifb commented 7 years ago

Seen with clang+OS X and freeBSD+clang

sage -t --long src/doc/en/prep/Calculus.rst  # 1 doctest failed
sage -t --long src/sage/plot/graphics.py  # 1 doctest failed
sage -t --long src/sage/plot/plot.py  # 1 doctest failed
sage -t --long src/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx  # 1 doctest failed
sage -t --long src/sage/structure/coerce.pyx  # 1 doctest failed

All these doctest fail because an unexpected warning is emitted:

File "src/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx", line 21, in sage.rings.polynomial.polynomial_real_mpfr_dense
Failed example:
    numpy.float32('1.5') * x
Expected:
    1.50000000000000*x
Got:
    doctest:warning
      File "/usr/home/dima/Sage/sage/src/bin/sage-runtests", line 89, in <module>
        err = DC.run()
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/control.py", line 1134, in run
        self.run_doctests()
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/control.py", line 858, in run_doctests
        self.dispatcher.dispatch()
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1705, in dispatch
        self.parallel_dispatch()
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1595, in parallel_dispatch
        w.start()  # This might take some time
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1871, in start
        super(DocTestWorker, self).start()
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/process.py", line 130, in start
        self._popen = Popen(self)
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/forking.py", line 126, in __init__
        code = process_obj._bootstrap()
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
        self.run()
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1844, in run
        task(self.options, self.outtmpfile, msgpipe, self.result_queue)
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 2137, in __call__
        runner.run(test)
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 641, in run
        return self._run(test, compileflags, out)
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 503, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/usr/home/dima/Sage/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 866, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.rings.polynomial.polynomial_real_mpfr_dense[7]>", line 1, in <module>
        numpy.float32('1.5') * x
    :
    RuntimeWarning: invalid value encountered in multiply
    1.50000000000000*x

More specifically, the warning is emitted by the call

sage: x=polygen(RR)
sage: numpy.float32('1.5') * x

seen on freeBSD+clang, OS X+clang and linux+clang.

Similarly, the warning is emitted in

sage: numpy.float64(5)>e

or >= instead of >, or pi instead of e. Note that pi.n() and e.n() are of type RR, so again it points at the direction on mpfr.

Depends on #22582

Upstream: Fixed upstream, but not in a stable release.

CC: @dimpase @jhpalmieri

Component: porting

Author: François Bissey, Dima Pasechnik, Paul Zimmermann

Branch/Commit: fd29778

Reviewer: John Palmieri, Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/22799

dimpase commented 7 years ago

Dependencies: #22895

dimpase commented 7 years ago

comment:50

numpy upgrade is on #22582.

dimpase commented 7 years ago

comment:51

Let's agree on who would work on debugging; say, in src/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx one would probably need to set an FPU trap to find exactly what triggers the numpy warning in np.float32('1.5')*polygen(RR) (it's very likely to be a call to mpfr, but which one?)

Or perhaps there should be a way to set this up globally, for all the cython modules.

(I'd be too busy in the coming week).

dimpase commented 7 years ago

Changed dependencies from #22895 to #22582

dimpase commented 7 years ago

comment:53

Similarly, the warning is emitted in

sage: numpy.float64(5)>e

or >= instead of >, or pi instead of e. Note that pi.n() and e.n() are of type RR, so again it points at the direction on mpfr.

dimpase commented 7 years ago

Description changed:

--- 
+++ 
@@ -58,4 +58,11 @@
 sage: x=polygen(RR)
 sage: numpy.float32('1.5') * x

-seen on freeBSD+clang, OS X+clang and linux+clang. +seen on freeBSD+clang, OS X+clang and linux+clang. + +Similarly, the warning is emitted in + + +sage: numpy.float64(5)>e + +or >= instead of >, or pi instead of e. Note that pi.n() and e.n() are of type RR, so again it points at the direction on mpfr.

dimpase commented 7 years ago

comment:54

In spirit, it's pretty much the same as the polygen(RR) problem. Let b=numpy.float64(5). Then b.__gt__(e) prints the same warning; under the hood it apparently calls mpfr, which raises an FP flag, which then gets picked up by the warning printer.

Note that if I first call np.seterr(invalid='ignore') then no warnings are printed.

zimmermann6 commented 7 years ago

comment:55

it could help to configure MPFR with --enable-logging (see details in doc/README.dev in the MPFR source repository):

For example, just define MPFR_LOG_ALL, run you program, and view `mpfr.log`.

kiwifb commented 7 years ago

comment:56

Leads me to two observation. I accidentally rebuilt mpfr with gcc on my linux box (MPFR_CONFIGURE="--enable-logging" ./sage -f mpfr) and ran all the test successfully. So mpfr+clang seems to be the real cause of the trouble. Once I remembered that on my linux machine I had to set CC and CXX I encountered another problem trying to compile with logging:

libtool: compile:  clang -DMPFR_USE_LOGGING=1 -DTIME_WITH_SYS_TIME=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_LOCALE_H=1 -DHAVE_WCHAR_H=1 -DHAVE_STDARG=1 -DHAVE_SYS_TIME_H=1 -DHAVE_STRUCT_LCONV_DECIMAL_POINT=1 -DHAVE_STRUCT_LCONV_THOUSANDS_SEP=1 -DHAVE_ALLOCA_H=1 -DHAVE_STDINT_H=1 -DHAVE_VA_COPY=1 -DHAVE_SETLOCALE=1 -DHAVE_GETTIMEOFDAY=1 -DHAVE_LONG_LONG=1 -DHAVE_INTMAX_T=1 -DMPFR_HAVE_INTMAX_MAX=1 -DMPFR_HAVE_FESETROUND=1 -DHAVE_DENORMS=1 -DHAVE_SIGNEDZ=1 -DHAVE_ROUND=1 -DHAVE_TRUNC=1 -DHAVE_FLOOR=1 -DHAVE_CEIL=1 -DHAVE_NEARBYINT=1 -DHAVE_LDOUBLE_IEEE_EXT_LITTLE=1 -DHAVE_CLOCK_GETTIME=1 -DLT_OBJDIR=\".libs/\" -DHAVE_ATTRIBUTE_MODE=1 -DHAVE___GMPN_ROOTREM=1 -I. -I/home/fbissey/sandbox/git-fork/sage-clang/local/include -Wall -Wmissing-prototypes -Wpointer-arith -m64 -O2 -march=corei7-avx -mtune=corei7-avx -g -MT add.lo -MD -MP -MF .deps/add.Tpo -c add.c  -fPIC -DPIC -o .libs/add.o
add.c:28:3: error: illegal storage class on function
  MPFR_LOG_FUNC
  ^
./mpfr-impl.h:1716:3: note: expanded from macro 'MPFR_LOG_FUNC'
  auto void __mpfr_log_cleanup (int *time);                             \
  ^
add.c:28:3: error: function definition is not allowed here
./mpfr-impl.h:1717:39: note: expanded from macro 'MPFR_LOG_FUNC'
  void __mpfr_log_cleanup (int *time) {                                 \
                                      ^
2 errors generated.

So we'll have to fix logging with clang before we can investigate with this tool. I will try on OS X shortly.

kiwifb commented 7 years ago

comment:57

Hum equivalent error on OS X

libtool: compile:  gcc -DMPFR_USE_LOGGING=1 -DTIME_WITH_SYS_TIME=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_LOCALE_H=1 -DHAVE_WCHAR_H=1 -DHAVE_STDARG=1 -DHAVE_SYS_TIME_H=1 -DHAVE_STRUCT_LCONV_DECIMAL_POINT=1 -DHAVE_STRUCT_LCONV_THOUSANDS_SEP=1 -DHAVE_ALLOCA_H=1 -DHAVE_STDINT_H=1 -DHAVE_VA_COPY=1 -DHAVE_SETLOCALE=1 -DHAVE_GETTIMEOFDAY=1 -DHAVE_LONG_LONG=1 -DHAVE_INTMAX_T=1 -DMPFR_HAVE_INTMAX_MAX=1 -DMPFR_HAVE_FESETROUND=1 -DHAVE_DENORMS=1 -DHAVE_SIGNEDZ=1 -DHAVE_ROUND=1 -DHAVE_TRUNC=1 -DHAVE_FLOOR=1 -DHAVE_CEIL=1 -DHAVE_NEARBYINT=1 -DHAVE_LDOUBLE_IEEE_EXT_LITTLE=1 -DHAVE_CLOCK_GETTIME=1 -DLT_OBJDIR=\".libs/\" -DHAVE_ATTRIBUTE_MODE=1 -DHAVE___GMPN_ROOTREM=1 -I. -I/Users/fbissey/build/sage-clang/local/include -Wall -Wmissing-prototypes -Wpointer-arith -m64 -O2 -march=corei7-avx -mtune=corei7-avx -g -MT exceptions.lo -MD -MP -MF .deps/exceptions.Tpo -c exceptions.c  -fno-common -DPIC -o .libs/exceptions.o
In file included from exceptions.c:23:
./mpfr-impl.h:1557:4: error: "Logging not supported (needs gcc >= 3.0 and GNU C Library >= 2.0)."
#  error "Logging not supported (needs gcc >= 3.0 and GNU C Library >= 2.0)."
   ^
1 error generated.

dimpase commented 7 years ago

comment:58

Unless you already tried this, I'd try re-running autoconf.

kiwifb commented 7 years ago

comment:59

Replying to @dimpase:

Unless you already tried this, I'd try re-running autoconf.

You mean recreating mpfr's configure by running autoreconf or something else altogether?

dimpase commented 7 years ago

comment:60

/* The following test on glibc is there mainly for Darwin (Mac OS X), to
   obtain a better error message. The real test should have been a test
   concerning nested functions in gcc, which are disabled by default on
   Darwin; but it is not possible to do that without a configure test. */
# if defined (__cplusplus) || !(__MPFR_GNUC(3,0) && __MPFR_GLIBC(2,0))
#  error "Logging not supported (needs gcc >= 3.0 and GNU C Library >= 2.0)."

seems to say that logging needs nested functions. And clang does not do them, as we know... I'd say this is an MPFR bug, no?

kiwifb commented 7 years ago

comment:61

Replying to @dimpase:

/* The following test on glibc is there mainly for Darwin (Mac OS X), to
   obtain a better error message. The real test should have been a test
   concerning nested functions in gcc, which are disabled by default on
   Darwin; but it is not possible to do that without a configure test. */
# if defined (__cplusplus) || !(__MPFR_GNUC(3,0) && __MPFR_GLIBC(2,0))
#  error "Logging not supported (needs gcc >= 3.0 and GNU C Library >= 2.0)."

seems to say that logging needs nested functions. And clang does not do them, as we know... I'd say this is an MPFR bug, no?

Yes. Using a GNU extension - the word bug can be argued, but it fails to adhere to the standard which you do at your own peril and the cost of portability.

dimpase commented 7 years ago

comment:62

One can of course do a log on gcc and hope that it's identical to what one would get on clang...

zimmermann6 commented 7 years ago

comment:63

I'd say this is an MPFR bug, no?

it is in fact a clang bug:

https://bugs.llvm.org//show_bug.cgi?id=6378

"Clang doesn't support the GNU nested function extension, sorry. We have no plans to implement it."

dimpase commented 7 years ago

comment:64

Replying to @zimmermann6:

I'd say this is an MPFR bug, no?

it is in fact a clang bug:

https://bugs.llvm.org//show_bug.cgi?id=6378

"Clang doesn't support the GNU nested function extension, sorry. We have no plans to implement it."

Well, as MPFR makes no claims to adhere to a C standard, you may indeed consider it a feature :-)

dimpase commented 7 years ago

comment:65

for what's worth, this is the mpfr.log I see on linux/gcc, after stripping the initialisation part, and running np.float64(5).__gt__(e); looks like mpfr is computing exp(1.0) to certain precision.

> mpfr_exp:IN  x[53]=1 rnd=3
> mpfr_const_log2_internal:IN  rnd_mode=0
> mpfr_const_log2_internal:ZIV 1st prec=42
> mpfr_div:IN  u[42]=2.2496e+21 v[42]=3.24549e+21 rnd=0
> mpfr_div:TIM 0ms
> mpfr_div:OUT q[42]=0.693147 inexact=-1
> mpfr_const_log2_internal:TIM 0ms
> mpfr_const_log2_internal:OUT x[32]=0.693147 inex=1
> mpfr_mul:IN  b[32]=0.693147 c[64]=4.61169e+18 rnd=2
> mpfr_mul:TIM 0ms
> mpfr_mul:OUT a[32]=3.19658e+18 inexact=1
> mpfr_sub_ui:IN  x[64]=-4.61169e+18 u=2 rnd=0
> mpfr_sub:IN  b[64]=-4.61169e+18 c[64]=2 rnd=0
> mpfr_sub:TIM 0ms
> mpfr_sub:OUT a[64]=-4.61169e+18
> mpfr_sub_ui:TIM 0ms
> mpfr_sub_ui:OUT y[64]=-4.61169e+18 inexact=0
> mpfr_mul:IN  b[32]=0.693147 c[64]=-4.61169e+18 rnd=3
> mpfr_mul:TIM 0ms
> mpfr_mul:OUT a[32]=-3.19658e+18 inexact=-1
> mpfr_exp_2:IN  x[53]=1 rnd=3
> mpfr_const_log2_internal:IN  rnd_mode=0
> mpfr_const_log2_internal:ZIV 1st prec=74
> mpfr_div:IN  u[74]=2.17458e+40 v[74]=3.13725e+40 rnd=0
> mpfr_div:TIM 0ms
> mpfr_div:OUT q[74]=0.693147 inexact=-1
> mpfr_const_log2_internal:TIM 0ms
> mpfr_const_log2_internal:OUT x[64]=0.693147 inex=1
> mpfr_div:IN  u[53]=1 v[64]=0.693147 rnd=0
> mpfr_div:TIM 0ms
> mpfr_div:OUT q[64]=1.4427 inexact=1
> mpfr_exp_2.114: d(x)=1.000000000000000000000000000000e+00 n=1
> mpfr_exp_2:ZIV 1st prec=78
> mpfr_exp_2.152: n=1 K=5 l=11 q=78 error_r=2
> mpfr_const_log2_internal:IN  rnd_mode=0
> mpfr_const_log2_internal:ZIV 1st prec=90
> mpfr_div:IN  u[90]=8.48887e+52 v[90]=1.22469e+53 rnd=0
> mpfr_div:TIM 0ms
> mpfr_div:OUT q[90]=0.693147 inexact=-1
> mpfr_const_log2_internal:TIM 0ms
> mpfr_const_log2_internal:OUT x[80]=0.693147 inex=1
> mpfr_exp_2.169:x[53]=1
> mpfr_exp_2.170:r[80]=0.693147
> mpfr_sub:IN  b[53]=1 c[80]=0.693147 rnd=2
> mpfr_sub:TIM 0ms
> mpfr_sub:OUT a[80]=0.306853
> mpfr_exp_2.189:r[78]=0.306853
> mpfr_div_2ui:IN  x[78]=0.306853 n=5 rnd=2
> mpfr_div_2ui:TIM 0ms
> mpfr_div_2ui:OUT y[78]=0.00958915 inexact=0
> mpfr_exp_2.202: l=270 q=78 (K+l)*q^2=1.673e+06
> mpfr_exp_2.219: before mult. by 2^n:
> mpfr_exp_2.220:s[80]=1.35914
> mpfr_exp_2.221: err=5 bits
> mpfr_mul_2si:IN  x[80]=1.35914 n=1 rnd=3
> mpfr_mul_2si:TIM 0ms
> mpfr_mul_2si:OUT y[53]=2.71828 inexact=-1
> mpfr_exp_2:TIM 3ms
> mpfr_exp_2:OUT y[53]=2.71828 inexact=-1
> mpfr_exp:TIM 3ms
> mpfr_exp:OUT y[53]=2.71828 inexact=-1
> mpfr_exp:IN  x[53]=1 rnd=2
> mpfr_mul:IN  b[32]=0.693147 c[64]=4.61169e+18 rnd=2
> mpfr_mul:TIM 0ms
> mpfr_mul:OUT a[32]=3.19658e+18 inexact=1
> mpfr_sub_ui:IN  x[64]=-4.61169e+18 u=2 rnd=0
> mpfr_sub:IN  b[64]=-4.61169e+18 c[64]=2 rnd=0
> mpfr_sub:TIM 0ms
> mpfr_sub:OUT a[64]=-4.61169e+18
> mpfr_sub_ui:TIM 0ms
> mpfr_sub_ui:OUT y[64]=-4.61169e+18 inexact=0
> mpfr_mul:IN  b[32]=0.693147 c[64]=-4.61169e+18 rnd=3
> mpfr_mul:TIM 0ms
> mpfr_mul:OUT a[32]=-3.19658e+18 inexact=-1
> mpfr_exp_2:IN  x[53]=1 rnd=2
> mpfr_div:IN  u[53]=1 v[64]=0.693147 rnd=0
> mpfr_div:TIM 0ms
> mpfr_div:OUT q[64]=1.4427 inexact=1
> mpfr_exp_2.114: d(x)=1.000000000000000000000000000000e+00 n=1
> mpfr_exp_2:ZIV 1st prec=78
> mpfr_exp_2.152: n=1 K=5 l=11 q=78 error_r=2
> mpfr_exp_2.169:x[53]=1
> mpfr_exp_2.170:r[80]=0.693147
> mpfr_sub:IN  b[53]=1 c[80]=0.693147 rnd=2
> mpfr_sub:TIM 0ms
> mpfr_sub:OUT a[80]=0.306853
> mpfr_exp_2.189:r[78]=0.306853
> mpfr_div_2ui:IN  x[78]=0.306853 n=5 rnd=2
> mpfr_div_2ui:TIM 0ms
> mpfr_div_2ui:OUT y[78]=0.00958915 inexact=0
> mpfr_exp_2.202: l=270 q=78 (K+l)*q^2=1.673e+06
> mpfr_exp_2.219: before mult. by 2^n:
> mpfr_exp_2.220:s[80]=1.35914
> mpfr_exp_2.221: err=5 bits
> mpfr_mul_2si:IN  x[80]=1.35914 n=1 rnd=2
> mpfr_mul_2si:TIM 0ms
> mpfr_mul_2si:OUT y[53]=2.71828 inexact=1
> mpfr_exp_2:TIM 0ms
> mpfr_exp_2:OUT y[53]=2.71828 inexact=1
> mpfr_exp:TIM 0ms
> mpfr_exp:OUT y[53]=2.71828 inexact=1
> mpfr_mul:IN  b[64]=52 c[77]=0.25 rnd=2
> mpfr_mul:TIM 0ms
> mpfr_mul:OUT a[64]=13 inexact=0
> mpfr_mul:IN  b[64]=52 c[77]=0.25 rnd=2
> mpfr_mul:TIM 0ms
> mpfr_mul:OUT a[64]=13 inexact=0
> mpfr_add:IN  b[53]=-5 c[53]=2.71828 rnd=3
> mpfr_add:TIM 0ms
> mpfr_add:OUT a[53]=-2.28172
> mpfr_add:IN  b[53]=-5 c[53]=2.71828 rnd=2
> mpfr_add:TIM 0ms
> mpfr_add:OUT a[53]=-2.28172
> mpfr_mul:IN  b[64]=52 c[77]=0.25 rnd=2
> mpfr_mul:TIM 0ms
> mpfr_mul:OUT a[64]=13 inexact=0
> mpfr_mul:IN  b[64]=52 c[77]=0.25 rnd=2
> mpfr_mul:TIM 0ms
> mpfr_mul:OUT a[64]=13 inexact=0
> mpfr_exp_2: Ziv failed 0.00% (0 bad cases / 2 calls)
> mpfr_const_log2_internal: Ziv failed 0.00% (0 bad cases / 3 calls)

zimmermann6 commented 7 years ago

comment:66

indeed, MPFR is first computing exp(1) with rounding towards -infinity, then again exp(1) with rounding towards +infinity.

Then it computes twice 52*0.25 with rounding towards +infinity (I wonder why the same value is computed twice).

Then it adds -5 and 2.71828 with rounding towards -infinity and +infinity, I guess this is to compare intervals for 5 and exp(1).

Again it computes twice 52*0.25 with the same rounding, I don't know why.

kiwifb commented 7 years ago

comment:67

Replying to @dimpase:

Replying to @zimmermann6:

I'd say this is an MPFR bug, no?

it is in fact a clang bug:

https://bugs.llvm.org//show_bug.cgi?id=6378

"Clang doesn't support the GNU nested function extension, sorry. We have no plans to implement it."

Well, as MPFR makes no claims to adhere to a C standard, you may indeed consider it a feature :-)

As a person that has worked/ is working on rather exotic systems and do porting I consider it sad. It has been an extension for a rather a long time, if it was considered a useful or desirable feature it would be in the standard or planned for the next standard. That feature doesn't seem to have a big uptake either, we only had one package that absolutely required porting in sage so far.

Fortunately you don't use it in the functional part of mpfr but it is annoying.

My opinion is that while you make no claim to be standard compliant you should aim towards it and admit it as an issue, it does not have to be a show stopper bug.

zimmermann6 commented 7 years ago

comment:68

My opinion is that while you make no claim to be standard compliant you should aim towards it and admit it as an issue, it does not have to be a show stopper bug.

feel free to report to the clang developers. It makes no sense to implement in MPFR a feature that should be implemented by the compiler.

Back to the issue, does np.float64(5).__gt__(e) give the warning with clang?

Paul

zimmermann6 commented 7 years ago

comment:69

anyway, to trace the function mpfr_exp for example, you can apply the following patch (against the development version, but it should apply to 3.1.5 as well):

--- src/exp.c   (revision 11456)
+++ src/exp.c   (working copy)
@@ -42,10 +42,15 @@
   int inexact;
   MPFR_SAVE_EXPO_DECL (expo);

+#if 0
   MPFR_LOG_FUNC
     (("x[%Pu]=%.*Rg rnd=%d", mpfr_get_prec (x), mpfr_log_prec, x, rnd_mode),
      ("y[%Pu]=%.*Rg inexact=%d",
       mpfr_get_prec (y), mpfr_log_prec, y, inexact));
+#else
+  mpfr_printf ("x[%Pu]=%.*Rg rnd=%d\n", mpfr_get_prec (x), 6, x, rnd_mode);
+  fflush (stdout);
+#endif

   if (MPFR_UNLIKELY( MPFR_IS_SINGULAR(x) ))
     {
@@ -185,5 +190,8 @@
         }
     }

+  mpfr_printf ("y[%Pu]=%.*Rg inexact=%d\n", mpfr_get_prec (y), 6, y, inexact);
+  fflush (stdout);
+
   return mpfr_check_range (y, inexact, rnd_mode);
 }

This will enable to see whether the warning occurs inside the mpfr_exp call (you might want to replace mpfr_printf(...) by mpfr_fprintf (stderr, ...) if the warning is printed to stderr, and change fflush(stdout) into fflush(stderr)).

jhpalmieri commented 7 years ago

comment:70

Replying to @zimmermann6:

Back to the issue, does np.float64(5).__gt__(e) give the warning with clang?

Yes, at least on OS X:

sage: np.float16(5).__gt__(e)
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True
sage: np.float32(5).__gt__(e)
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True
sage: np.float64(5).__gt__(e)
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True
sage: np.float128(5).__gt__(e)
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True

kiwifb commented 7 years ago

comment:71

Replying to @jhpalmieri:

Replying to @zimmermann6:

Back to the issue, does np.float64(5).__gt__(e) give the warning with clang?

Yes, at least on OS X:

sage: np.float16(5).__gt__(e)
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True
sage: np.float32(5).__gt__(e)
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True
sage: np.float64(5).__gt__(e)
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True
sage: np.float128(5).__gt__(e)
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True

Same on linux.

jhpalmieri commented 7 years ago

comment:72

With the patch from comment:69:

sage: import numpy as np
sage: np.float128(5).__gt__(e)
x[53]=1 rnd=3
y[53]=2.71828 inexact=-1
x[53]=1 rnd=2
y[53]=2.71828 inexact=1
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta5/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python
True

jhpalmieri commented 7 years ago

comment:73

Any suggestions for other changes along the lines of comment:69 to help track down the problem?

dimpase commented 7 years ago

comment:74

I suppose anything short of actually setting up a watch on FPU bits will not help much. (sorry for slow response - I'm in single-parenting mode for a week :-))

zimmermann6 commented 7 years ago

comment:75

from comment [comment:72] it seems the warning occurs after the two calls to mpfr_exp. Here is another patch to see whether it occurs in mpfr_mul or mpfr_add:

Index: src/add.c
===================================================================
--- src/add.c   (revision 11456)
+++ src/add.c   (working copy)
@@ -25,11 +25,16 @@
 MPFR_HOT_FUNCTION_ATTR int
 mpfr_add (mpfr_ptr a, mpfr_srcptr b, mpfr_srcptr c, mpfr_rnd_t rnd_mode)
 {
+#if 0
   MPFR_LOG_FUNC
     (("b[%Pu]=%.*Rg c[%Pu]=%.*Rg rnd=%d",
       mpfr_get_prec (b), mpfr_log_prec, b,
       mpfr_get_prec (c), mpfr_log_prec, c, rnd_mode),
      ("a[%Pu]=%.*Rg", mpfr_get_prec (a), mpfr_log_prec, a));
+#else
+  printf ("enter mpfr_add\n");
+  fflush (stdout);
+#endif

   if (MPFR_ARE_SINGULAR_OR_UBF (b, c))
     {
@@ -100,23 +105,28 @@
   MPFR_ASSERTD (MPFR_IS_PURE_FP (b));
   MPFR_ASSERTD (MPFR_IS_PURE_FP (c));

+  int ret;
   if (MPFR_UNLIKELY(MPFR_SIGN(b) != MPFR_SIGN(c)))
     { /* signs differ, it is a subtraction */
       if (MPFR_LIKELY(MPFR_PREC(a) == MPFR_PREC(b)
                       && MPFR_PREC(b) == MPFR_PREC(c)))
-        return mpfr_sub1sp(a, b, c, rnd_mode);
+        ret = mpfr_sub1sp(a, b, c, rnd_mode);
       else
-        return mpfr_sub1(a, b, c, rnd_mode);
+        ret = mpfr_sub1(a, b, c, rnd_mode);
     }
   else
     { /* signs are equal, it's an addition */
       if (MPFR_LIKELY(MPFR_PREC(a) == MPFR_PREC(b)
                       && MPFR_PREC(b) == MPFR_PREC(c)))
-        return mpfr_add1sp(a, b, c, rnd_mode);
+        ret = mpfr_add1sp(a, b, c, rnd_mode);
       else
         if (MPFR_GET_EXP(b) < MPFR_GET_EXP(c))
-          return mpfr_add1(a, c, b, rnd_mode);
+          ret = mpfr_add1(a, c, b, rnd_mode);
         else
-          return mpfr_add1(a, b, c, rnd_mode);
+          ret = mpfr_add1(a, b, c, rnd_mode);
     }
+
+  printf ("exit mpfr_add\n");
+  fflush (stdout);
+  return ret;
 }
Index: src/mul.c
===================================================================
--- src/mul.c   (revision 11456)
+++ src/mul.c   (working copy)
@@ -688,6 +688,7 @@
   mp_size_t bn, cn, tn, k, threshold;
   MPFR_TMP_DECL (marker);

+#if 0
   MPFR_LOG_FUNC
     (("b[%Pu]=%.*Rg c[%Pu]=%.*Rg rnd=%d",
       mpfr_get_prec (b), mpfr_log_prec, b,
@@ -694,6 +695,10 @@
       mpfr_get_prec (c), mpfr_log_prec, c, rnd_mode),
      ("a[%Pu]=%.*Rg inexact=%d",
       mpfr_get_prec (a), mpfr_log_prec, a, inexact));
+#else
+  printf ("enter mpfr_mul\n");
+  fflush (stdout);
+#endif

   /* deal with special cases */
   if (MPFR_ARE_SINGULAR (b, c))
@@ -1030,5 +1035,7 @@
         rnd_mode = MPFR_RNDZ;
       return mpfr_underflow (a, rnd_mode, sign);
     }
+  printf ("exit mpfr_mul\n");
+  fflush (stdout);
   MPFR_RET (inexact);
 }

Side question: what routine of MPFR (if any) does np.float128(5).__gt__ call?

dimpase commented 7 years ago

comment:76

Replying to @zimmermann6:

from comment [comment:72] it seems the warning occurs after the two calls to mpfr_exp.

This is correct; there is no interrupt mechanics set that would make sure the warning printed immediately. The warning is printed after numpy completes the task of computing the value of np.float128(5).__gt__(e), before it returns the result.

Side question: what routine of MPFR (if any) does np.float128(5).__gt__ call?

Numpy people told us some details here. They say that behind the curtains it will try calling something like e.__lt__, assuming e is the argument (and all this happens within a compiled module written in (generated) C, making it hard to debug easily).

So we have two Python-based computer algebra systems not talking to each other too well...

zimmermann6 commented 7 years ago

comment:77

my guess is the following:

(1) first intervals of MPFR values are computed that enclose 5 and exp(1)

(2) then those intervals are converted into the np.float128 type

(3) then the comparison is performed

I guess the warning occurs because a NaN was generated in step (2). It might be inside the mpfr_get_float128 function. Here is another patch to check:

Index: src/get_float128.c
===================================================================
--- src/get_float128.c  (revision 11456)
+++ src/get_float128.c  (working copy)
@@ -30,8 +30,15 @@
 mpfr_get_float128 (mpfr_srcptr x, mpfr_rnd_t rnd_mode)
 {

+  printf ("enter mpfr_get_float128\n");
+  fflush (stdout);
+
   if (MPFR_UNLIKELY (MPFR_IS_SINGULAR (x)))
-    return (__float128) mpfr_get_d (x, rnd_mode);
+    {
+      printf ("exit mpfr_get_float128: MPFR_IS_SINGULAR(x)\n");
+      fflush (stdout);
+      return (__float128) mpfr_get_d (x, rnd_mode);
+    }
   else /* now x is a normal non-zero number */
     {
       __float128 r; /* result */
@@ -97,6 +104,8 @@
         }
       if (sign < 0)
         r = -r;
+      printf ("exit mpfr_get_float128: normal case\n");
+      fflush (stdout);
       return r;
     }
 }

jhpalmieri commented 7 years ago

comment:78

I don't see get_float128.c in version 3.1.5. I've tried to use similar patches in get_ld.c, get_d64.c, and get_float.c, but none of the relevant functions print anything when I run np.float128(5).__gt__(e) (or the same with np.float64, etc.).

dimpase commented 7 years ago

comment:79

Replying to @zimmermann6:

my guess is the following:

(1) first intervals of MPFR values are computed that enclose 5 and exp(1)

(2) then those intervals are converted into the np.float128 type

IMHO it is different (and isn't in so by the complete(?) trace in comment 65 above): numpy has no way to convert MPFR numbers into numpy numbers, without asking Sage to do this. And it does not even know that Sage can do it. So it all happens on the level of Python data: numpy knows that after getting "not implemented" from np.float128(5).__gt__(e) it may try e.__lt__(np.float128(5)). And the latter invokes comparison in Sage, done with MPFR numbers.

(3) then the comparison is performed

Isn't the actual comparison is performed on MPFR numbers rather than on numpy numbers? As above

> mpfr_add:IN  b[53]=-5 c[53]=2.71828 rnd=3
> mpfr_add:TIM 0ms
> mpfr_add:OUT a[53]=-2.28172
> mpfr_add:IN  b[53]=-5 c[53]=2.71828 rnd=2
> mpfr_add:TIM 0ms

zimmermann6 commented 7 years ago

comment:80

Isn't the actual comparison is performed on MPFR numbers rather than on numpy numbers?

yes it might be, since after the two mpfr_add calls we should get an interval [u,v] where exp(1)-5 lies. Then I guess Sage should check whether u > 0 or v < 0. But this is possible via several MPFR functions (which are not logged through --enable-logging). It could be mpfr_cmp_ui (u, 0), or mpfr_cmp (u, zero) since zero is predefined, or mpfr_sgn(u). One should add logging in those functions to see which one is called.

jhpalmieri commented 7 years ago

comment:81

As far as I can tell, the last thing called (or at least the last thing called in which I've added logging) before numpy reports an error is mpfr_cmp3.

zimmermann6 commented 7 years ago

comment:82

As far as I can tell, the last thing called (or at least the last thing called in which I've added logging) before numpy reports an error is mpfr_cmp3.

please could you test with the following patch (against mpfr-3.1.5)?

--- cmp.c       2016-09-27 09:58:15.000000000 +0200
+++ /tmp/cmp.c  2017-05-12 08:32:54.914688069 +0200
@@ -35,6 +35,11 @@
   mp_size_t bn, cn;
   mp_limb_t *bp, *cp;

+  printf ("enter mpfr_cmp3\n");
+  printf ("b="); mpfr_dump (b);
+  printf ("c="); mpfr_dump (c);
+  printf ("s=%d\n", s);
+
   s = MPFR_MULT_SIGN( s , MPFR_SIGN(c) );

   if (MPFR_ARE_SINGULAR(b, c))
@@ -42,34 +47,59 @@
       if (MPFR_IS_NAN (b) || MPFR_IS_NAN (c))
         {
           MPFR_SET_ERANGE ();
+          printf ("exit mpfr_cmp3: NaN case\n");
           return 0;
         }
       else if (MPFR_IS_INF(b))
         {
           if (MPFR_IS_INF(c) && s == MPFR_SIGN(b) )
-            return 0;
+            {
+              printf ("exit mpfr_cmp3: Inf1 case\n");
+              return 0;
+            }
           else
-            return MPFR_SIGN(b);
+            {
+              printf ("exit mpfr_cmp3: Inf2 case\n");
+              return MPFR_SIGN(b);
+            }
         }
       else if (MPFR_IS_INF(c))
-        return -s;
+        {
+          printf ("exit mpfr_cmp3: Inf3 case\n");
+          return -s;
+        }
       else if (MPFR_IS_ZERO(b))
-        return MPFR_IS_ZERO(c) ? 0 : -s;
+        {
+          printf ("exit mpfr_cmp3: zero1 case\n");
+          return MPFR_IS_ZERO(c) ? 0 : -s;
+        }
       else /* necessarily c=0 */
-        return MPFR_SIGN(b);
+        {
+          return MPFR_SIGN(b);
+          printf ("exit mpfr_cmp3: zero2 case\n");
+        }
     }
   /* b and c are real numbers */
   if (s != MPFR_SIGN(b))
-    return MPFR_SIGN(b);
+    {
+      printf ("exit mpfr_cmp3: s != MPFR_SIGN(b)\n");
+      return MPFR_SIGN(b);
+    }

   /* now signs are equal */

   be = MPFR_GET_EXP (b);
   ce = MPFR_GET_EXP (c);
   if (be > ce)
-    return s;
+    {
+      printf ("exit mpfr_cmp3: be > ce\n");
+      return s;
+    }
   if (be < ce)
-    return -s;
+    {
+      printf ("exit mpfr_cmp3: be < ce\n");
+      return -s;
+    }

   /* both signs and exponents are equal */

@@ -82,18 +112,31 @@
   for ( ; bn >= 0 && cn >= 0; bn--, cn--)
     {
       if (bp[bn] > cp[cn])
-        return s;
+        {
+          printf ("exit mpfr_cmp3: bp[bn] > cp[cn]\n");
+          return s;
+        }
       if (bp[bn] < cp[cn])
-        return -s;
+        {
+          printf ("exit mpfr_cmp3: bp[bn] < cp[cn]\n");
+          return -s;
+        }
     }
   for ( ; bn >= 0; bn--)
     if (bp[bn])
-      return s;
+      {
+        printf ("exit mpfr_cmp3: bp[bn] > 0\n");
+        return s;
+      }
   for ( ; cn >= 0; cn--)
     if (cp[cn])
-      return -s;
+      {
+        printf ("exit mpfr_cmp3: cp[bn] > 0\n");
+        return -s;
+      }

-   return 0;
+  printf ("exit mpfr_cmp3: equal case\n");
+  return 0;
 }

 #undef mpfr_cmp

jhpalmieri commented 7 years ago

comment:83

I get this:

enter mpfr_cmp3
b=-0
c=0.10011011111100001010100010110001010001010111011010010E1
s=1
exit mpfr_cmp3: zero1 case
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta4/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in less
  #!/usr/bin/env python

zimmermann6 commented 7 years ago

comment:84

I see no "less" comparison in the mpfr_cmp3 branch corresponding to the zero1 case. The invalid value encountered in less warning might correspond to a comparison with NaN, but neither b nor c are NaN here.

dimpase commented 7 years ago

comment:85

What the the Python call that triggers this?

It used to be np.float64(5).__gt__(e), giving invalid value in greater warning. Does the same call with patched MPFR give invalid value in less warning?

jhpalmieri commented 7 years ago

comment:86

I get warnings with both __gt__ and __lt__; the one in comment:83 was from __lt__. When I do np.float64('1.5').__gt__(e), the various logging messages end in this:

enter mpfr_cmp3
b=0.10011011111100001010100010110001010001010111011010100E1
c=0
s=1
enter mpfr_cmp3
b=-0
c=0.10011011111100001010100010110001010001010111011010010E1
s=1
exit mpfr_cmp3: zero1 case
/Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage-8.0.beta4/src/bin/sage-ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/env python

dimpase commented 7 years ago

comment:87

Replying to @zimmermann6:

I see no "less" comparison in the mpfr_cmp3 branch corresponding to the zero1 case. The invalid value encountered in less warning might correspond to a comparison with NaN, but neither b nor c are NaN here.

is there a "greater" comparison? This is the one that would correspond to __lt__ in the original call, as numpy people tell us.

zimmermann6 commented 7 years ago

comment:88

is there a "greater" comparison?

there is no "greater" comparison either. The only comparisons are between exponents and words of the significand, but no such comparison occurs when one of the operands is zero. And anyway, there is no double-precision NaN in that function.

dimpase commented 7 years ago

comment:89

here is somewhat less involved way to trigger this, not involving running through numpy evaluation loop

import numpy as np
from ctypes import cdll
from ctypes.util import find_library
libm = cdll.LoadLibrary(find_library('m'))
print libm.fetestexcept(int(0x01)) # checks if FE_INVALID is set
bool(e.__lt__(np.float32('1.5')))
print libm.fetestexcept(int(0x01))

Running this on Linux/gcc produces

0
False
0

while on FreeBSD/clang I get

0
False
1

After I've found this, I decided to check whether merely importing numpy does something to the FPU flags on FreeBSD, and in fact it does! Namely, the output of

from ctypes import cdll
from ctypes.util import find_library
libm = cdll.LoadLibrary(find_library('m'))
print libm.fetestexcept(int(0x01))
import numpy
print libm.fetestexcept(int(0x01))

is

0
0

on Linux/gcc, and

1
0

on FreeBSD/clang! And in fact one can see that the FE_INVALID bit is flipped by

bool(e.__lt__(float('1.5')))

just as well:

sage: import numpy 
....: from ctypes import cdll
....: from ctypes.util import find_library
....: libm = cdll.LoadLibrary(find_library('m'))
....: print libm.fetestexcept(int(0x01))
....: bool(e.__lt__(float('1.5')))
....: print libm.fetestexcept(int(0x01))
....: 
0
False
1

on FreeBSD/clang (but the last 1 becomes 0 on Linux/gcc).

dimpase commented 7 years ago

comment:90

That is, we also need to debug Sage for a place that flips FE_INVALID!

$ ./sage --python
Python 2.7.13 (default, May 14 2017, 23:48:25) 
[GCC 4.2.1 Compatible Clang 4.0.0 ] on freebsd11
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes import cdll
>>> from ctypes.util import find_library
>>> libm = cdll.LoadLibrary(find_library('m'))
>>> print libm.fetestexcept(int(0x01))
0
>>> from sage.all import *
>>> print libm.fetestexcept(int(0x01))
1

jhpalmieri commented 7 years ago

comment:91

I added a bunch of print ("TAG: {}".format(libm.fetestexcept(int(0x01)))) statements with various tags. The result changes from 0 to 1 in the file sage/libs/pynac/pynac.pyx, at the line init_pynac_I(). Within that function, it changes at the line

  K = QuadraticField(-1, 'I', embedding=CC.gen(), latex_name='i')

Quadratic fields are constructed using UniqueFactory, and the result changes from 0 to 1 in the try/except block

        cache_key = key
        print ("020: {}".format(libm.fetestexcept(int(0x01))))
        try:
            try:
                return self._cache[version, cache_key]
            except TypeError: # key is unhashable
                print ("030: {}".format(libm.fetestexcept(int(0x01))))
                cache_key = _cache_key(cache_key)
                return self._cache[version, cache_key]
        except KeyError:
            print ("040: {}".format(libm.fetestexcept(int(0x01))))
            pass

in the get_object method for UniqueFactory in sage/structure/factory.pyx. With the print statements as indicated, I see

020: 0
040: 1
050: 1

So is it something to do with the cache? The cache is defined by

self._cache = sage.misc.weak_dict.WeakValueDictionary()

So is something going on with Sage's weak dictionaries?

There is also the possibility that I'm misinterpreting everything and the problem is somewhere else completely.

dimpase commented 7 years ago

comment:92

I don't get the need of nested try/except blocks in that fragment of the code, is it just some leftover?

jhpalmieri commented 7 years ago

comment:93

I suppose it should be

try:
    blah
except TypeError:
    blah
except KeyError:
    pass

?

jhpalmieri commented 7 years ago

comment:94

Unless the last except is also there to catch a KeyError in the call to return self._cache[version, cache_key] within the first except clause.

zimmermann6 commented 7 years ago

comment:95

what happens if you remove the except KeyError part?

dimpase commented 7 years ago

comment:96

Probably one should try building fpectl Python module (not in Sage Python) and use it to locate where flags are raised during the import.

zimmermann6 commented 7 years ago

comment:97

Replying to @dimpase:

Probably one should try building fpectl Python module (not in Sage Python) and use it to locate where flags are raised during the import.

good idea. I'm curious to see where a NaN is generated.

sagemath / sage

Strange warnings from numpy/matplotlib when sage is built with clang #22799