Open mkoeppe opened 3 years ago
Moving to 9.4, as 9.3 has been released.
I saw what I think are the same failures on one OS X 11.5.2 machine (but not another one — I think it's CPU dependent). The failing machine says
% sysctl -n machdep.cpu.brand_string
Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
I can provide other CPU info if you let me know what commands to run.
The tests passed when I did tox -e local-homebrew-macos-minimal -- ptestlong
. Any guesses for what homebrew package is causing the problem? What should I plug into ./configure --with-system-PKG=no
?
These tests passed when I did ./configure --with-system-gsl=no --with-system-openblas=no
.
Just using ./configure --with-system-gsl=no
failed to build: gsl's configure script failed with
configure: error: in `/Users/jpalmier/Desktop/Sage/sage_builds/TESTING/sage-9.5.beta1/local/var/tmp/sage/build/gsl-2.6/src':
configure: error: C compiler cannot create executables
and gsl's config.log file said
ld: library not found for -lopenblas
I'm trying now with ./configure --with-system-openblas=no
, which means that Sage will also build its own gsl
, r
, and suitesparse
.
(This computer is in my work office, and so I will not check its progress again until Monday.)
By the way, I expect this to pass these tests — it should be equivalent to what I already did with ./configure --with-system-gsl=no --with-system-openblas=no
— but I want to see the output of make ptestlong
.
All tests passed when using that flag (--with-system-openblas=no
).
Description changed:
---
+++
@@ -6,4 +6,4 @@
The numerics failures on debian-bullseye have disappeared (or perhaps it is processor dependent, and we were luckier in this run), but show up in debian-sid this time.
The numerics failures on ubuntu-groovy are still present.
-
+Another report for `debian-bullseye`: https://groups.google.com/g/sage-devel/c/kip6kYlL95Q/m/fjUbYwA-AwAJ
John, do you see on macOS the error reported by Vincent?
File "src/sage/matrix/matrix_double_dense.pyx", line 1365, in
sage.matrix.matrix_double_dense.Matrix_double_dense.?
Failed example:
A.eigenvalues(algorithm='symmetric', tol=1.0e-5) # tol 2e-15
Expected:
[(-8.0, 22), (2.0, 77), (22.0, 1)]
Got:
[(-13.81753974166025, 1),
...
also, what openblas version did you try on homebrew? the bumped to 0.3.18 two weeks ago: https://github.com/Homebrew/homebrew-core/commit/63de340b1e397b67e7137519f45b02491b6e1f15
I will check on Monday.
Replying to @dimpase:
John, do you see on macOS the error reported by Vincent?
File "src/sage/matrix/matrix_double_dense.pyx", line 1365, in sage.matrix.matrix_double_dense.Matrix_double_dense.? Failed example: A.eigenvalues(algorithm='symmetric', tol=1.0e-5) # tol 2e-15 Expected: [(-8.0, 22), (2.0, 77), (22.0, 1)] Got: [(-13.81753974166025, 1), ...
- it's one I know how to produce a standalone test for (as a short Fortran program linking OpenBLAS)
Dima, even a standalone test with scipy would be already good enough. I would like to check if the Debian package is completely broken or simply used wrongly within sage. In both cases, it might be known from Debian developers.
test Fortran program
Attachment: dev_test.f90.gz
Attachment: HigmanSims.tst.gz
the failing example (spectrum of the adj. matrix of 100-vertex Higman-Sims graph)
Attachment: petersen.tst.gz
smaller example (from Petersen graph)
script to compute test data
Attachment: graphdata.sage.gz
I've uploaded an attempted test of OpenBLAS. Download the attachments and
gfortran dev_test.f90 -lopenblas
./a.out <HigmanSims.tst
The last line of the output should say max. deviation of eigenvals: <small number>
, where <small number>
is smaller than 10-13 or so if all is good.
(there is another, smaller, example, too, petersen.tst)
Please note I'm not sure I used exactly the same LAPACK function as in Sage failing test - there is one more function to try, DSYEVR
if this one, DSYEV
, would just work.
test using LAPACK's dsyevr call
Attachment: dev_test_dsyevr.f90.gz
please also try the latter attachment (same instructions, different file name, dev_test_dsyevr.f90
Replying to @videlec:
Dima, even a standalone test with scipy would be already good enough. I would like to check if the Debian package is completely broken or simply used wrongly within sage. In both cases, it might be known from Debian developers.
here is a direct scipy test via Sage:
sage: import numpy as np
sage: import scipy
sage: import scipy.linalg
sage: g=graphs.HigmanSimsGraph()
sage: a=np.matrix(g.adjacency_matrix())
sage: eig=scipy.linalg.lapack.dsyevr(a); eig[0]
array([-8., -8., -8., -8., -8., -8., -8., -8., -8., -8., -8., -8., -8.,
-8., -8., -8., -8., -8., -8., -8., -8., -8., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 22.])
Replying to @dimpase:
John, do you see on macOS the error reported by Vincent?
File "src/sage/matrix/matrix_double_dense.pyx", line 1365, in sage.matrix.matrix_double_dense.Matrix_double_dense.? Failed example: A.eigenvalues(algorithm='symmetric', tol=1.0e-5) # tol 2e-15 Expected: [(-8.0, 22), (2.0, 77), (22.0, 1)] Got: [(-13.81753974166025, 1), ...
Yes, I see this failure. This is with an un-updated version of OpenBLAS: 0.3.17 from 2021-09-23 according to brew info openblas
. I will try upgrading.
I can't use the standalone test because I get ld: library not found for -lopenblas
.
Replying to @jhpalmieri:
I can't use the standalone test because I get
ld: library not found for -lopenblas
.
You'd need -L
flag:
gfortran <...>.f90 -L/usr/local/opt/openblas/lib -lopenblas
Thanks. When I run ./a.out <HigmanSims.tst
, the last line is the same with either test program: max. deviation of eigenvals: 1.2878587085651816E-014
.
With petersen.tst
I get max. deviation of eigenvals: 8.8817841970012523E-016
.
On the other hand, when I run the example in Sage in comment:17, I get
array([-9.51662688, -9.01957761, -8.82571413, -8.53509523, -8.45449832,
-8.19421094, -8.06706488, -8. , -8. , -8. ,
-8. , -8. , -8. , -8. , -8. ,
-8. , -8. , -7.92742484, -7.63275394, -7.25448498,
-7.06795155, -6.90611622, -6.64469129, -6.54715317, -5.4872644 ,
-5.29660049, -4.69884647, -4.33752666, -4.30959029, -3.75213367,
-3.57502391, -2.92795773, -2.58101222, -2.57098415, -2.53138416,
-2.27048023, -2.16406448, -1.80261557, -1.75557973, -1.59188765,
-1.41796672, -1.35756654, -1.13476674, -0.92256797, -0.65109354,
-0.49437488, -0.27070739, -0.07512487, 0.17239949, 0.4835247 ,
0.54260365, 0.67677662, 1.03820124, 1.12679921, 1.41540993,
1.65236118, 1.75100574, 1.86305818, 1.99997232, 2. ,
2. , 2. , 2. , 2. , 2. ,
2. , 2. , 2. , 2. , 2.01296242,
2.23534418, 2.37464655, 2.50556135, 2.69055392, 2.73682618,
3.00360424, 3.18959084, 3.33877411, 3.4352485 , 3.60531369,
3.74100246, 3.88461972, 3.98561053, 4.12222078, 4.2778004 ,
4.42940272, 4.49540354, 4.52975646, 4.83264503, 4.90738351,
5.1034719 , 5.17889141, 5.6570698 , 5.75157547, 6.26235796,
6.26836517, 7.08899349, 7.30672858, 8.74786038, 22. ])
Replying to @jhpalmieri:
Thanks. When I run
./a.out <HigmanSims.tst
, the last line is the same with either test program:max. deviation of eigenvals: 1.2878587085651816E-014
.
Which Fortran program do you compile? Please try dev_test_dsyevr.f90
if you did not.
Replying to @dimpase:
Replying to @jhpalmieri:
Thanks. When I run
./a.out <HigmanSims.tst
, the last line is the same with either test program:max. deviation of eigenvals: 1.2878587085651816E-014
.Which Fortran program do you compile? Please try
dev_test_dsyevr.f90
if you did not.
I tried both (that's what I meant by "either test program").
ok, so the next step would be to try the "normal" scipy run. That is, you can print a
to a file, and then start ./sage --python
, read a
in, and then run all the other lines in comment:17.
and if the latter still returns incorrect results, I'd try a standalone scipy, built from source using openblas from homebrew.
Replying to @dimpase:
ok, so the next step would be to try the "normal" scipy run. That is, you can print
a
to a file, and then start./sage --python
, reada
in, and then run all the other lines in comment:17.
one quick way to do this is the following:
sage: import numpy as np
sage: g=graphs.HigmanSimsGraph()
sage: a=np.matrix(g.adjacency_matrix())
sage: np.save('/tmp/x',a)
now we have a
in /tmp/x.npy
. So start
./sage --python
and at its >>>
prompt do
import numpy as np
a=np.load('/tmp/x.npy')
import scipy
import scipy.linalg
eig=scipy.linalg.lapack.dsyevr(a); eig[0]
Needless to say, it's a good idea to check that /tmp/x.npy
is correct, so copying it on a machine where Sage is not broken and testing it as above might be good idea.
Replying to @dimpase: and if the latter still returns incorrect results, I'd try a standalone scipy, built from source using openblas from homebrew.
This gives correct results (although I wrote the matrix g.adjacency_matrix()
to a file and then read it back in and fed it into np.matrix(...)
).
So it's not OpenBLAS or SciPy themselves, right? Something else in Sage upsets OpenBLAS (and only OpenBLAS from numpy, right?) on this particular CPU.
For what it's worth, I dumped a
to a file using np.save(FILE, a)
. Then I loaded it and ran the above list of commands.
./sage --python
./sage
And naturally it works fine if I use a version of Sage built with ./configure --with-system-openblas=no
.
Could gsl
be involved somehow, rather than openblas
?
It need not be something even dependent on openblas that creates this weird CPU state. It could be something using AVX-512, or OpenMP.
Unfortunately Sage isn't modularised enough so that one could just load parts one by one, and check if the bug appears.
I built Sage with
export CFLAGS="-L/usr/local/opt/openblas/lib"
./configure --with-system-gsl=no
and I am not seeing the errors. This is using homebrew's openblas
.
Interesting. First, it's a bug either in spkg-install of GSL, as it demands that cblas
is known to pkg-config
, which is not the case here - it can get the same from openblas
, or somewhere in our hacky procedure of creating .pc files from the one for openblas
.
Second, Sage is still on GSL 2.6, but Homebrew is on GSL 2.7. There is #32607 - which supposed to be fixing something GSL 2.7-related (but I am happily running GSL 2.7 on Linux with Sage...)
Matthias has tickets to make GSL optional.
Regarding the 1st issue: if you fire up ./sage --buildsh
, and at its prompt, run
pkg-config --libs cblas
, what's the output?
I suspect that
sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm"
in GSL's spkg-install.in should be
sdh_configure LIBS="`pkg-config --libs cblas` -lm"
(--libs-only-l
doesn't print -L
flags.)
Can you try the latter line without CFLAGS
?
Should pkgconf
be a dependency for gsl
?
Replying to @dimpase:
Regarding the 1st issue: if you fire up
./sage --buildsh
, and at its prompt, runpkg-config --libs cblas
, what's the output?
-L/usr/local/Cellar/openblas/0.3.18/lib -lopenblas
I suspect that
sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm"
in GSL's spkg-install.in should be
sdh_configure LIBS="`pkg-config --libs cblas` -lm"
(
--libs-only-l
doesn't print-L
flags.)Can you try the latter line without
CFLAGS
?
Will do.
As noted in #32587, GSL 2.7 may be built with an option to use old API. This does not happen on Homebrew, see https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/gsl.rb
However, on Debian Bullseye GSL is still version 2.6, so perhaps it's not the version alone that's reponsible. https://packages.debian.org/source/stable/gsl
Vincent, could you check the GSL version you're using, and build Sage with system OpenBLAS, but without GSL, just as John did here?
Replying to @jhpalmieri:
Replying to @dimpase:
Regarding the 1st issue: if you fire up
./sage --buildsh
, and at its prompt, runpkg-config --libs cblas
, what's the output?-L/usr/local/Cellar/openblas/0.3.18/lib -lopenblas
I suspect that
sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm"
in GSL's spkg-install.in should be
sdh_configure LIBS="`pkg-config --libs cblas` -lm"
(
--libs-only-l
doesn't print-L
flags.)Can you try the latter line without
CFLAGS
?Will do.
Perhaps one should leave that LIBS settings alone, and add LDFLAGS (it's weird that CFLAGS worked for you!) as follows:
sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm" LDFLAGS="$LDFLAGS `pkg-config --libs-only-L cblas`"
Replying to @dimpase:
Perhaps one should leave that LIBS settings alone, and add LDFLAGS (it's weird that CFLAGS worked for you!) as follows:
sdh_configure LIBS="`pkg-config --libs-only-l cblas` -lm" LDFLAGS="$LDFLAGS `pkg-config --libs-only-L cblas`"
gsl
builds but the sagelib
package fails to build this way: ld: library not found for -lopenblas
.
how about
--- a/.homebrew-build-env
+++ b/.homebrew-build-env
@@ -23,7 +23,7 @@ export PKG_CONFIG_PATH
LIBRARY_PATH="$HOMEBREW/lib$LIBRARY_PATH"
[ -z "$CPATH" ] || CPATH=":${CPATH}"
CPATH="$HOMEBREW/include$CPATH"
-for l in readline bzip2 ntl; do
+for l in readline bzip2 ntl openblas; do
if [ -d "$HOMEBREW/opt/$l/lib" ]; then
LIBRARY_PATH="$HOMEBREW/opt/$l/lib:$LIBRARY_PATH"
fi
i.e. change .homebrew-build-env
as above, source it, and then run make build
again.
I'm really confused now.
.homebrew-build-env
), Sage builds, but the example in comment:17 fails and I see the numerical failures in doctesting.CFLAGS
), then Sage builds (although with warnings "clang: warning: argument unused during compilation: '-L/usr/local/opt/openblas/lib' [-Wunused-command-line-argument]") and the example in comment:17 works. ./sage -tp --long src/sage/matrix
succeeds: no numerical failures.probably "success" with gsl comes from it not using openblas at all. You can check by running otool -L
% otool -L local/lib/libgsl.25.dylib
returns the same thing in each of the two cases (edited to replace the actual path with $SAGE_ROOT
):
% otool -L local/lib/libgsl.25.dylib
local/lib/libgsl.25.dylib:
$SAGE_ROOT/local/lib/libgsl.25.dylib (compatibility version 26.0.0, current version 26.0.0)
/usr/local/opt/openblas/lib/libopenblas.0.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)
Same with otool -L local/lib/libgslcblas.0.dylib
.
From https://groups.google.com/g/sage-release/c/6WjKQt_e_B8/m/dpx1qILOCwAJ (for 9.3.rc2):
9.3.beta8> {debian-bullseye,ubuntu-groovy}-standard: cvxopt testsuite errors, numerics-related sage testsuite errors 9.3.beta8> (is system BLAS feeling OK??)
The numerics failures on debian-bullseye have disappeared (or perhaps it is processor dependent, and we were luckier in this run), but show up in debian-sid this time. The numerics failures on ubuntu-groovy are still present.
Another report for
debian-bullseye
: https://groups.google.com/g/sage-devel/c/kip6kYlL95Q/m/fjUbYwA-AwAJCC: @dimpase @jhpalmieri @videlec @kliem
Component: porting
Issue created by migration from https://trac.sagemath.org/ticket/31621