sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.35k stars 460 forks source link

ubuntu-groovy-standard, debian-bullseye-standard: numerics-related sage testsuite errors #31621

Open mkoeppe opened 3 years ago

mkoeppe commented 3 years ago

From https://groups.google.com/g/sage-release/c/6WjKQt_e_B8/m/dpx1qILOCwAJ (for 9.3.rc2):

9.3.beta8> {debian-bullseye,ubuntu-groovy}-standard: cvxopt testsuite errors, numerics-related sage testsuite errors 9.3.beta8> (is system BLAS feeling OK??)

The numerics failures on debian-bullseye have disappeared (or perhaps it is processor dependent, and we were luckier in this run), but show up in debian-sid this time. The numerics failures on ubuntu-groovy are still present.

Another report for debian-bullseye: https://groups.google.com/g/sage-devel/c/kip6kYlL95Q/m/fjUbYwA-AwAJ

CC: @dimpase @jhpalmieri @videlec @kliem

Component: porting

Issue created by migration from https://trac.sagemath.org/ticket/31621

mkoeppe commented 3 years ago
comment:1

Moving to 9.4, as 9.3 has been released.

jhpalmieri commented 3 years ago
comment:4

I saw what I think are the same failures on one OS X 11.5.2 machine (but not another one — I think it's CPU dependent). The failing machine says

% sysctl -n machdep.cpu.brand_string
Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz

I can provide other CPU info if you let me know what commands to run.

jhpalmieri commented 3 years ago
comment:5

The tests passed when I did tox -e local-homebrew-macos-minimal -- ptestlong. Any guesses for what homebrew package is causing the problem? What should I plug into ./configure --with-system-PKG=no?

jhpalmieri commented 3 years ago
comment:6

These tests passed when I did ./configure --with-system-gsl=no --with-system-openblas=no.

Just using ./configure --with-system-gsl=no failed to build: gsl's configure script failed with

configure: error: in `/Users/jpalmier/Desktop/Sage/sage_builds/TESTING/sage-9.5.beta1/local/var/tmp/sage/build/gsl-2.6/src':
configure: error: C compiler cannot create executables

and gsl's config.log file said

ld: library not found for -lopenblas

I'm trying now with ./configure --with-system-openblas=no, which means that Sage will also build its own gsl, r, and suitesparse.

(This computer is in my work office, and so I will not check its progress again until Monday.)

jhpalmieri commented 3 years ago
comment:7

By the way, I expect this to pass these tests — it should be equivalent to what I already did with ./configure --with-system-gsl=no --with-system-openblas=no — but I want to see the output of make ptestlong.

jhpalmieri commented 3 years ago
comment:8

All tests passed when using that flag (--with-system-openblas=no).

mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -6,4 +6,4 @@
 The numerics failures on debian-bullseye have disappeared (or perhaps it is processor dependent, and we were luckier in this run), but show up in debian-sid this time. 
 The numerics failures on ubuntu-groovy are still present.

-
+Another report for `debian-bullseye`: https://groups.google.com/g/sage-devel/c/kip6kYlL95Q/m/fjUbYwA-AwAJ
dimpase commented 2 years ago
comment:11

John, do you see on macOS the error reported by Vincent?

File "src/sage/matrix/matrix_double_dense.pyx", line 1365, in
sage.matrix.matrix_double_dense.Matrix_double_dense.?
Failed example:
     A.eigenvalues(algorithm='symmetric', tol=1.0e-5)  # tol 2e-15
Expected:
     [(-8.0, 22), (2.0, 77), (22.0, 1)]
Got:
     [(-13.81753974166025, 1),
...
dimpase commented 2 years ago
comment:12

also, what openblas version did you try on homebrew? the bumped to 0.3.18 two weeks ago: https://github.com/Homebrew/homebrew-core/commit/63de340b1e397b67e7137519f45b02491b6e1f15

jhpalmieri commented 2 years ago
comment:13

I will check on Monday.

videlec commented 2 years ago
comment:14

Replying to @dimpase:

John, do you see on macOS the error reported by Vincent?

File "src/sage/matrix/matrix_double_dense.pyx", line 1365, in
sage.matrix.matrix_double_dense.Matrix_double_dense.?
Failed example:
     A.eigenvalues(algorithm='symmetric', tol=1.0e-5)  # tol 2e-15
Expected:
     [(-8.0, 22), (2.0, 77), (22.0, 1)]
Got:
     [(-13.81753974166025, 1),
...
  • it's one I know how to produce a standalone test for (as a short Fortran program linking OpenBLAS)

Dima, even a standalone test with scipy would be already good enough. I would like to check if the Debian package is completely broken or simply used wrongly within sage. In both cases, it might be known from Debian developers.

dimpase commented 2 years ago

test Fortran program

dimpase commented 2 years ago

Attachment: dev_test.f90.gz

Attachment: HigmanSims.tst.gz

the failing example (spectrum of the adj. matrix of 100-vertex Higman-Sims graph)

dimpase commented 2 years ago

Attachment: petersen.tst.gz

smaller example (from Petersen graph)

dimpase commented 2 years ago

script to compute test data

dimpase commented 2 years ago
comment:15

Attachment: graphdata.sage.gz

I've uploaded an attempted test of OpenBLAS. Download the attachments and

gfortran dev_test.f90 -lopenblas
./a.out <HigmanSims.tst

The last line of the output should say max. deviation of eigenvals: <small number>, where <small number> is smaller than 10-13 or so if all is good.

(there is another, smaller, example, too, petersen.tst)

Please note I'm not sure I used exactly the same LAPACK function as in Sage failing test - there is one more function to try, DSYEVR if this one, DSYEV, would just work.

dimpase commented 2 years ago

test using LAPACK's dsyevr call

dimpase commented 2 years ago
comment:16

Attachment: dev_test_dsyevr.f90.gz

please also try the latter attachment (same instructions, different file name, dev_test_dsyevr.f90

dimpase commented 2 years ago
comment:17

Replying to @videlec:

Dima, even a standalone test with scipy would be already good enough. I would like to check if the Debian package is completely broken or simply used wrongly within sage. In both cases, it might be known from Debian developers.

here is a direct scipy test via Sage:

sage: import numpy as np
sage: import scipy
sage: import scipy.linalg
sage: g=graphs.HigmanSimsGraph()
sage: a=np.matrix(g.adjacency_matrix())
sage: eig=scipy.linalg.lapack.dsyevr(a); eig[0]
array([-8., -8., -8., -8., -8., -8., -8., -8., -8., -8., -8., -8., -8.,
       -8., -8., -8., -8., -8., -8., -8., -8., -8.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.,  2.,  2.,  2.,  2.,  2.,  2., 22.])
jhpalmieri commented 2 years ago
comment:18

Replying to @dimpase:

John, do you see on macOS the error reported by Vincent?

File "src/sage/matrix/matrix_double_dense.pyx", line 1365, in
sage.matrix.matrix_double_dense.Matrix_double_dense.?
Failed example:
     A.eigenvalues(algorithm='symmetric', tol=1.0e-5)  # tol 2e-15
Expected:
     [(-8.0, 22), (2.0, 77), (22.0, 1)]
Got:
     [(-13.81753974166025, 1),
...

Yes, I see this failure. This is with an un-updated version of OpenBLAS: 0.3.17 from 2021-09-23 according to brew info openblas. I will try upgrading.

jhpalmieri commented 2 years ago
comment:19

I can't use the standalone test because I get ld: library not found for -lopenblas.

dimpase commented 2 years ago
comment:20

Replying to @jhpalmieri:

I can't use the standalone test because I get ld: library not found for -lopenblas.

You'd need -L flag:

gfortran <...>.f90 -L/usr/local/opt/openblas/lib -lopenblas
jhpalmieri commented 2 years ago
comment:21

Thanks. When I run ./a.out <HigmanSims.tst, the last line is the same with either test program: max. deviation of eigenvals: 1.2878587085651816E-014.

jhpalmieri commented 2 years ago
comment:22

With petersen.tst I get max. deviation of eigenvals: 8.8817841970012523E-016.

jhpalmieri commented 2 years ago
comment:23

On the other hand, when I run the example in Sage in comment:17, I get

array([-9.51662688, -9.01957761, -8.82571413, -8.53509523, -8.45449832,
       -8.19421094, -8.06706488, -8.        , -8.        , -8.        ,
       -8.        , -8.        , -8.        , -8.        , -8.        ,
       -8.        , -8.        , -7.92742484, -7.63275394, -7.25448498,
       -7.06795155, -6.90611622, -6.64469129, -6.54715317, -5.4872644 ,
       -5.29660049, -4.69884647, -4.33752666, -4.30959029, -3.75213367,
       -3.57502391, -2.92795773, -2.58101222, -2.57098415, -2.53138416,
       -2.27048023, -2.16406448, -1.80261557, -1.75557973, -1.59188765,
       -1.41796672, -1.35756654, -1.13476674, -0.92256797, -0.65109354,
       -0.49437488, -0.27070739, -0.07512487,  0.17239949,  0.4835247 ,
        0.54260365,  0.67677662,  1.03820124,  1.12679921,  1.41540993,
        1.65236118,  1.75100574,  1.86305818,  1.99997232,  2.        ,
        2.        ,  2.        ,  2.        ,  2.        ,  2.        ,
        2.        ,  2.        ,  2.        ,  2.        ,  2.01296242,
        2.23534418,  2.37464655,  2.50556135,  2.69055392,  2.73682618,
        3.00360424,  3.18959084,  3.33877411,  3.4352485 ,  3.60531369,
        3.74100246,  3.88461972,  3.98561053,  4.12222078,  4.2778004 ,
        4.42940272,  4.49540354,  4.52975646,  4.83264503,  4.90738351,
        5.1034719 ,  5.17889141,  5.6570698 ,  5.75157547,  6.26235796,
        6.26836517,  7.08899349,  7.30672858,  8.74786038, 22.        ])
dimpase commented 2 years ago
comment:24

Replying to @jhpalmieri:

Thanks. When I run ./a.out <HigmanSims.tst, the last line is the same with either test program: max. deviation of eigenvals: 1.2878587085651816E-014.

Which Fortran program do you compile? Please try dev_test_dsyevr.f90 if you did not.

jhpalmieri commented 2 years ago
comment:25

Replying to @dimpase:

Replying to @jhpalmieri:

Thanks. When I run ./a.out <HigmanSims.tst, the last line is the same with either test program: max. deviation of eigenvals: 1.2878587085651816E-014.

Which Fortran program do you compile? Please try dev_test_dsyevr.f90 if you did not.

I tried both (that's what I meant by "either test program").

dimpase commented 2 years ago
comment:26

ok, so the next step would be to try the "normal" scipy run. That is, you can print a to a file, and then start ./sage --python, read a in, and then run all the other lines in comment:17.

dimpase commented 2 years ago
comment:27

and if the latter still returns incorrect results, I'd try a standalone scipy, built from source using openblas from homebrew.

dimpase commented 2 years ago
comment:28

Replying to @dimpase:

ok, so the next step would be to try the "normal" scipy run. That is, you can print a to a file, and then start ./sage --python, read a in, and then run all the other lines in comment:17.

one quick way to do this is the following:

sage: import numpy as np
sage: g=graphs.HigmanSimsGraph()
sage: a=np.matrix(g.adjacency_matrix())
sage: np.save('/tmp/x',a)

now we have a in /tmp/x.npy. So start

./sage --python

and at its >>> prompt do

import numpy as np
a=np.load('/tmp/x.npy')
import scipy
import scipy.linalg
eig=scipy.linalg.lapack.dsyevr(a); eig[0]
dimpase commented 2 years ago
comment:29

Needless to say, it's a good idea to check that /tmp/x.npy is correct, so copying it on a machine where Sage is not broken and testing it as above might be good idea.

jhpalmieri commented 2 years ago
comment:30

Replying to @dimpase: and if the latter still returns incorrect results, I'd try a standalone scipy, built from source using openblas from homebrew.

This gives correct results (although I wrote the matrix g.adjacency_matrix() to a file and then read it back in and fed it into np.matrix(...)).

dimpase commented 2 years ago
comment:31

So it's not OpenBLAS or SciPy themselves, right? Something else in Sage upsets OpenBLAS (and only OpenBLAS from numpy, right?) on this particular CPU.

jhpalmieri commented 2 years ago
comment:32

For what it's worth, I dumped a to a file using np.save(FILE, a). Then I loaded it and ran the above list of commands.

jhpalmieri commented 2 years ago
comment:33

And naturally it works fine if I use a version of Sage built with ./configure --with-system-openblas=no.

jhpalmieri commented 2 years ago
comment:34

Could gsl be involved somehow, rather than openblas?

dimpase commented 2 years ago
comment:35

It need not be something even dependent on openblas that creates this weird CPU state. It could be something using AVX-512, or OpenMP.

Unfortunately Sage isn't modularised enough so that one could just load parts one by one, and check if the bug appears.

jhpalmieri commented 2 years ago
comment:36

I built Sage with

export CFLAGS="-L/usr/local/opt/openblas/lib"
./configure --with-system-gsl=no

and I am not seeing the errors. This is using homebrew's openblas.

dimpase commented 2 years ago
comment:37

Interesting. First, it's a bug either in spkg-install of GSL, as it demands that cblas is known to pkg-config, which is not the case here - it can get the same from openblas, or somewhere in our hacky procedure of creating .pc files from the one for openblas.

Second, Sage is still on GSL 2.6, but Homebrew is on GSL 2.7. There is #32607 - which supposed to be fixing something GSL 2.7-related (but I am happily running GSL 2.7 on Linux with Sage...)

Matthias has tickets to make GSL optional.

dimpase commented 2 years ago
comment:38

Regarding the 1st issue: if you fire up ./sage --buildsh, and at its prompt, run pkg-config --libs cblas, what's the output?

I suspect that

sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm"

in GSL's spkg-install.in should be

sdh_configure LIBS="`pkg-config --libs cblas` -lm"

(--libs-only-l doesn't print -L flags.)

Can you try the latter line without CFLAGS ?

jhpalmieri commented 2 years ago
comment:39

Should pkgconf be a dependency for gsl?

jhpalmieri commented 2 years ago
comment:40

Replying to @dimpase:

Regarding the 1st issue: if you fire up ./sage --buildsh, and at its prompt, run pkg-config --libs cblas, what's the output?

-L/usr/local/Cellar/openblas/0.3.18/lib -lopenblas

I suspect that

sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm"

in GSL's spkg-install.in should be

sdh_configure LIBS="`pkg-config --libs cblas` -lm"

(--libs-only-l doesn't print -L flags.)

Can you try the latter line without CFLAGS ?

Will do.

dimpase commented 2 years ago
comment:41

As noted in #32587, GSL 2.7 may be built with an option to use old API. This does not happen on Homebrew, see https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/gsl.rb

However, on Debian Bullseye GSL is still version 2.6, so perhaps it's not the version alone that's reponsible. https://packages.debian.org/source/stable/gsl


Vincent, could you check the GSL version you're using, and build Sage with system OpenBLAS, but without GSL, just as John did here?

dimpase commented 2 years ago
comment:42

Replying to @jhpalmieri:

Replying to @dimpase:

Regarding the 1st issue: if you fire up ./sage --buildsh, and at its prompt, run pkg-config --libs cblas, what's the output?

-L/usr/local/Cellar/openblas/0.3.18/lib -lopenblas

I suspect that

sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm"

in GSL's spkg-install.in should be

sdh_configure LIBS="`pkg-config --libs cblas` -lm"

(--libs-only-l doesn't print -L flags.)

Can you try the latter line without CFLAGS ?

Will do.

Perhaps one should leave that LIBS settings alone, and add LDFLAGS (it's weird that CFLAGS worked for you!) as follows:

sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm" LDFLAGS="$LDFLAGS `pkg-config --libs-only-L cblas`"
jhpalmieri commented 2 years ago
comment:43

Replying to @dimpase:

Perhaps one should leave that LIBS settings alone, and add LDFLAGS (it's weird that CFLAGS worked for you!) as follows:

sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm" LDFLAGS="$LDFLAGS `pkg-config --libs-only-L cblas`"

gsl builds but the sagelib package fails to build this way: ld: library not found for -lopenblas.

dimpase commented 2 years ago
comment:44

how about

--- a/.homebrew-build-env
+++ b/.homebrew-build-env
@@ -23,7 +23,7 @@ export PKG_CONFIG_PATH
 LIBRARY_PATH="$HOMEBREW/lib$LIBRARY_PATH"
 [ -z "$CPATH" ] || CPATH=":${CPATH}"
 CPATH="$HOMEBREW/include$CPATH"
-for l in readline bzip2 ntl; do
+for l in readline bzip2 ntl openblas; do
     if [ -d "$HOMEBREW/opt/$l/lib" ]; then
         LIBRARY_PATH="$HOMEBREW/opt/$l/lib:$LIBRARY_PATH"
     fi

i.e. change .homebrew-build-env as above, source it, and then run make build again.

jhpalmieri commented 2 years ago
comment:45

I'm really confused now.

dimpase commented 2 years ago
comment:46

probably "success" with gsl comes from it not using openblas at all. You can check by running otool -L

jhpalmieri commented 2 years ago
comment:47

% otool -L local/lib/libgsl.25.dylib returns the same thing in each of the two cases (edited to replace the actual path with $SAGE_ROOT):

% otool -L local/lib/libgsl.25.dylib 
local/lib/libgsl.25.dylib:
    $SAGE_ROOT/local/lib/libgsl.25.dylib (compatibility version 26.0.0, current version 26.0.0)
    /usr/local/opt/openblas/lib/libopenblas.0.dylib (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)
jhpalmieri commented 2 years ago
comment:48

Same with otool -L local/lib/libgslcblas.0.dylib.