Closed jhpalmieri closed 5 years ago
Attachment: Sage_crash_report.txt
I note that it talk about libopenblas_haswellp-r0.3.5.dylib
. Shouldn't that be 0.3.6
instead? Was it from a building from scratch or an upgrade from the last beta?
Mine was an incremental build. I'll try from scratch next. David Coudert reported this failure on sage-release and there was a similar message. I don't know if his build was an upgrade or from scratch.
Description changed:
---
+++
@@ -1,4 +1,4 @@
-On at least some OS X machines, openblas 0.3.6 causes Sage to crash. From the end of Sage_crash_report.txt:
+On at least some OS X machines, after upgrading from openblas 0.3.5, openblas 0.3.6 causes Sage to crash. From the end of Sage_crash_report.txt:
ImportError: dlopen(/Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/python2.7/site-packages/sage/matrix/matrix_rational_dense.so, 2): Library not loaded: /Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/libopenblas_haswellp-r0.3.5.dylib
Description changed:
---
+++
@@ -1,7 +1,61 @@
-On at least some OS X machines, after upgrading from openblas 0.3.5, openblas 0.3.6 causes Sage to crash. From the end of Sage_crash_report.txt:
+Two issues with openblas 0.3.6.
+
+1. On at least some OS X machines, after upgrading from openblas 0.3.5, openblas 0.3.6 causes Sage to crash. From the end of Sage_crash_report.txt:
ImportError: dlopen(/Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/python2.7/site-packages/sage/matrix/matrix_rational_dense.so, 2): Library not loaded: /Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/libopenblas_haswellp-r0.3.5.dylib Referenced from: /Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/python2.7/site-packages/sage/matrix/matrix_rational_dense.so Reason: image not found
+This failure only occurs when upgrading Sage; it works when building from scratch.
+
+2. On some OS X machines but not all (I've seen it only on an iMac Pro):
+
+```
+sage -t src/doc/en/thematic_tutorials/explicit_methods_in_number_theory/birds_other.rst
+ Bad exit: 1
+**********************************************************************
+Tests run before process (pid=66736) failed:
+sage: from sage.schemes.hyperelliptic_curves.hypellfrob import hypellfrob ## line 16 ##
+sage: R.<x> = PolynomialRing(ZZ) ## line 17 ##
+sage: f = x^5 + 2*x^2 + x + 1; p = 101 ## line 18 ##
+sage: M = hypellfrob(p, 1, f); M ## line 19 ##
+[ O(101) O(101) 93 + O(101) 62 + O(101)]
+[ O(101) O(101) 55 + O(101) 19 + O(101)]
+[ O(101) O(101) 65 + O(101) 42 + O(101)]
+[ O(101) O(101) 89 + O(101) 29 + O(101)]
+sage: M = hypellfrob(p, 4, f) # about 0.25 seconds ## line 35 ##
+sage: M[0,0] ## line 36 ##
+91844754 + O(101^4)
+sage: M.charpoly() ## line 47 ##
+(1 + O(101^4))*x^4 + (7 + O(101^4))*x^3 + (167 + O(101^4))*x^2 + (707 + O(101^4))*x + 10201 + O(101^4)
+sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 50 ##
+0
+sage: M = ModularSymbols(GammaH(13,[3]), weight=4) ## line 86 ##
+sage: M ## line 87 ##
+Modular Symbols space of dimension 14 for Congruence Subgroup Gamma_H(13) with H generated by [3] of weight 4 with sign 0 and over Rational Field
+sage: M.basis() ## line 91 ##
+([X^2,(0,4)],
+ [X^2,(0,7)],
+ [X^2,(4,10)],
+ [X^2,(4,11)],
+ [X^2,(4,12)],
+ [X^2,(7,3)],
+ [X^2,(7,5)],
+ [X^2,(7,6)],
+ [X^2,(7,7)],
+ [X^2,(7,8)],
+ [X^2,(7,9)],
+ [X^2,(7,10)],
+ [X^2,(7,11)],
+ [X^2,(7,12)])
+sage: factor(charpoly(M.T(2))) ## line 106 ##
+(x - 7) * (x + 7) * (x - 9)^2 * (x + 5)^2 * (x^2 - x - 4)^2 * (x^2 + 9)^2
+sage: dimension(M.cuspidal_subspace()) ## line 109 ##
+Fatal: Memory exhausted.
+
+**********************************************************************
+----------------------------------------------------------------------
+sage -t src/doc/en/thematic_tutorials/explicit_methods_in_number_theory/birds_other.rst # Bad exit: 1
+```
+This occurs when building from scratch, with 8.8.beta7 merged with #27847.
Description changed:
---
+++
@@ -7,7 +7,7 @@
Referenced from: /Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/python2.7/site-packages/sage/matrix/matrix_rational_dense.so
Reason: image not found
-This failure only occurs when upgrading Sage; it works when building from scratch.
This failure only occurs when upgrading Sage; it works when building from scratch.
sage -t src/doc/en/thematic_tutorials/explicit_methods_in_number_theory/birds_other.rst # Bad exit: 1
-This occurs when building from scratch, with 8.8.beta7 merged with #27847.
+ This occurs when building from scratch, with 8.8.beta7 merged with #27847.
does this happen with clang-compiled Sage? And what Fortran?
Replying to @dimpase:
does this happen with clang-compiled Sage?
Yes
And what Fortran?
A previously compiled gfortran-6.4.0, which I keep around just so I don't have to wait for Sage to build its own. I'll try with Sage's gfortran to see if that makes a difference, at least in issue 2.
I don't think issue 1 has much to do with OpenBLAS update, it's just a building problem.
For some reason sage/matrix/matrix_rational_dense.so
didn't get rebuilt. I guess it might be Unicode values of strings involved in the description of this Extension in src/module_list.py
- indeed,
sage: import pkgconfig
sage: cblas_pc = pkgconfig.parse('cblas')
....: cblas_libs = cblas_pc['libraries']
....: cblas_library_dirs = cblas_pc['library_dirs']
....: cblas_include_dirs = cblas_pc['include_dirs']
....:
sage: cblas_libs
[u'openblas']
sage: cblas_library_dirs
[u'/mnt/opt/Sage/sage-dev/local/lib']
and then one has
libraries = ['iml', 'ntl', 'm'] + cblas_libs,
mixing str
and unicode
in Python 2, at least.
The unicode observation is interesting, although both David Coudert and I saw it with Python 2 and Python 3. With Python 3:
sage: import pkgconfig
sage: cblas_pc = pkgconfig.parse('cblas')
sage: cblas_libs = cblas_pc['libraries']
sage: cblas_library_dirs = cblas_pc['library_dirs']
sage: cblas_libs
['openblas']
sage: cblas_library_dirs
['/Users/jpalmier/Desktop/Sage/sage_builds/PYTHON3/sage-8.8.beta7/local/lib']
Oh, and I completely agree about issue 1 not being a problem with the openblas update, but instead being a Sage build problem.
No change (back to issue 2) if I use Sage's gfortran, by the way.
Let me know what else I can do to diagnose this.
it might be hardware-dependent. If you don't see this on an otherwise identical branch on another OSX machine with the same OS version, then that's the most obvious conclusion.
I agree. Is it an OpenBLAS issue? For what it's worth, when I run the OpenBLAS test suite, I seem to get failures with 0.3.5 and also 0.3.6, although spkg-check
exits successfully. None of the failures look immediately like "Fatal: Memory exhausted" to me, but I don't really know what I should be looking for.
Can you try to recompile openblas with different target arch, e.g.
OPENBLAS_CONFIGURE="TARGET=PRESCOTT" ./sage -p openblas
Also, are you not really running out of memory, e.g. background processes / ulimit / ...
I am pretty sure that I am not running out of memory. The machine has 32GB of RAM, and after running into this, I checked the available memory using the OS X "Activity Monitor" and everything was fine. I quit all potential memory hogs, I rebooted, and still got
sage: M = ModularSymbols(GammaH(13,[3]), weight=4)
sage: M.cuspidal_subspace()
Fatal: Memory exhausted.
I am away from the machine now, but I will try with a different target tomorrow.
Another datapoint would be running tests with multithreading disabled, e.g.
OPENBLAS_NUM_THREADS=1 sage -t src/doc/en/thematic_tutorials/explicit_methods_in_number_theory/birds_other.rst
OPENBLAS_CONFIGURE="TARGET=PRESCOTT" ./sage -p openblas
works: Sage passes all tests. Also, the openblas
test suite succeeds this way.
On the other hand, I still get "Fatal: Memory exhausted" if I don't specify OPENBLAS_CONFIGURE
but instead just disable multithreading with OPENBLAS_NUM_THREADS=1 sage -t ...
.
By the way, as mentioned above, if I don't specify OPENBLAS_CONFIGURE
, I see some apparent failures when I run ./sage -f -c openblas
, but spkg-check
exits successfully. Is make tests
misconfigured by OpenBLAS?
Replying to @vbraun:
Another datapoint would be running tests with multithreading disabled, e.g.
OPENBLAS_NUM_THREADS=1 sage -t src/doc/en/thematic_tutorials/explicit_methods_in_number_theory/birds_other.rst
IIUC this is already the default set (somewhat to my chagrin) in sage-env:
626 # Multithreading in OpenBLAS does not seem to play well with Sage's attempts to
627 # spawn new processes, see #26118. Apparently, OpenBLAS sets the thread
628 # affinity and, e.g., parallel doctest jobs, remain on the same core.
629 # Disabling that thread-affinity with OPENBLAS_MAIN_FREE=1 leads to hangs in
630 # some computations.
631 # So we disable OpenBLAS' threading completely; we might loose some performance
632 # here but strangely the opposite seems to be the case. Note that callers such
633 # as LinBox use a single-threaded OpenBLAS anyway.
634 export OPENBLAS_NUM_THREADS=1
unless we unset that when running the tests?
I wonder if the upgrade to 0.3.6 can be reverted until/unless the cause of this is rooted out. 0.3.5 was working fine, but there was a rush to upgrade for better gcc 9.0 support. However, this has a workaround to it already: Use an older compiler.
That or, if there is a specific change to OpenBLAS related to the gcc problem perhaps we could just patch that selectively.
I've been trawling through the existing issues opened against OpenBLAS and don't see anything obvious that matches this problem, though it would help to know exactly how this call is involving OpenBLAS.
Can you upload the openblas config.h and install log?
config.h
looks like this (without setting OPENBLAS_CONFIGURE
):
#define OS_DARWIN 1
#define ARCH_X86_64 1
#define C_CLANG 1
#define __64BIT__ 1
#define FUNDERSCORE _
#define PTHREAD_CREATE_FUNC pthread_create
#define BUNDERSCORE _
#define NEEDBUNDERSCORE 1
#define SKYLAKEX
#define L1_CODE_SIZE 32768
#define L1_CODE_ASSOCIATIVE 8
#define L1_CODE_LINESIZE 64
#define L1_DATA_SIZE 32768
#define L1_DATA_ASSOCIATIVE 8
#define L1_DATA_LINESIZE 64
#define L2_SIZE 262144
#define L2_ASSOCIATIVE 8
#define L2_LINESIZE 64
#define ITB_SIZE 2097152
#define ITB_ASSOCIATIVE 0
#define ITB_ENTRIES 8
#define DTB_SIZE 4096
#define DTB_ASSOCIATIVE 4
#define DTB_DEFAULT_ENTRIES 64
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4_1
#define HAVE_SSE4_2
#define HAVE_AVX
#define HAVE_AVX2
#define HAVE_AVX512VL
#define HAVE_FMA3
#define HAVE_CFLUSH
#define NUM_SHAREDCACHE 2
#define NUM_CORES 8
#define CORE_SKYLAKEX
#define CHAR_CORENAME "SKYLAKEX"
#define SLOCAL_BUFFER_SIZE 24576
#define DLOCAL_BUFFER_SIZE 32768
#define CLOCAL_BUFFER_SIZE 12288
#define ZLOCAL_BUFFER_SIZE 8192
#define GEMM_MULTITHREAD_THRESHOLD 4
If I set OPENBLAS_CONFIGURE="TARGET=PRESCOTT"
, it looks like
#define OS_DARWIN 1
#define ARCH_X86_64 1
#define C_CLANG 1
#define __64BIT__ 1
#define FUNDERSCORE _
#define PTHREAD_CREATE_FUNC pthread_create
#define BUNDERSCORE _
#define NEEDBUNDERSCORE 1
#define PENTIUM4
#define L1_DATA_SIZE 16384
#define L1_DATA_LINESIZE 64
#define L2_SIZE 1048576
#define L2_LINESIZE 64
#define DTB_DEFAULT_ENTRIES 64
#define DTB_SIZE 4096
#define L2_ASSOCIATIVE 8
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define CORE_PRESCOTT
#define CHAR_CORENAME "PRESCOTT"
#define SLOCAL_BUFFER_SIZE 8192
#define DLOCAL_BUFFER_SIZE 8192
#define CLOCAL_BUFFER_SIZE 8192
#define ZLOCAL_BUFFER_SIZE 8192
#define GEMM_MULTITHREAD_THRESHOLD 4
Attachment: openblas.log
Attachment: openblas-PRESCOTT.log
My guess is that its a problem with AVX512, not may cpus have it so its not tested much. Also part of the release notes is "the AVX512 DGEMM kernel has been disabled again due to unsolved problems", not exactly filling me with confidence that it all works as expected. Can you try:
OPENBLAS_CONFIGURE="NO_AVX512=1" ./sage -p openblas
Replying to @vbraun:
Can you try:
OPENBLAS_CONFIGURE="NO_AVX512=1" ./sage -p openblas
That works, all tests pass. config.h
, in case it's relevant:
#define OS_DARWIN 1
#define ARCH_X86_64 1
#define C_CLANG 1
#define __64BIT__ 1
#define FUNDERSCORE _
#define PTHREAD_CREATE_FUNC pthread_create
#define BUNDERSCORE _
#define NEEDBUNDERSCORE 1
#define HASWELL
#define L1_CODE_SIZE 32768
#define L1_CODE_ASSOCIATIVE 8
#define L1_CODE_LINESIZE 64
#define L1_DATA_SIZE 32768
#define L1_DATA_ASSOCIATIVE 8
#define L1_DATA_LINESIZE 64
#define L2_SIZE 262144
#define L2_ASSOCIATIVE 8
#define L2_LINESIZE 64
#define ITB_SIZE 2097152
#define ITB_ASSOCIATIVE 0
#define ITB_ENTRIES 8
#define DTB_SIZE 4096
#define DTB_ASSOCIATIVE 4
#define DTB_DEFAULT_ENTRIES 64
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4_1
#define HAVE_SSE4_2
#define HAVE_AVX
#define HAVE_AVX2
#define HAVE_FMA3
#define HAVE_CFLUSH
#define NUM_SHAREDCACHE 2
#define NUM_CORES 8
#define CORE_HASWELL
#define CHAR_CORENAME "HASWELL"
#define SLOCAL_BUFFER_SIZE 24576
#define DLOCAL_BUFFER_SIZE 32768
#define CLOCAL_BUFFER_SIZE 12288
#define ZLOCAL_BUFFER_SIZE 8192
#define GEMM_MULTITHREAD_THRESHOLD 4
Now it's using "HASWELL" instead of "SKYLAKEX".
By the way, what should the workflow be? I ran ... ./sage -p openblas
and then make
. I didn't see any obvious errors, but make
fails, and in particular Sage doesn't start. Running ./sage -ba
fixes it. This seems like issue !#1 (from the ticket description): something in the Sage library is not getting rebuilt properly.
Branch: u/vbraun/openblas_0_3_6_vs__os_x
I think "HASWELL" is correct since AVX512 support is essentially the only new isa
Blas doesn't have any arch-dependent headers so it must be that some binary is linking against libopenblas_haswellp-r0.3.5.dylib instead of libopenblas.dylib. Then when you replace it with libopenblas-whatever-0.3.6.dylib it wont' work any more. On Linux its correct, so thats an OSX special. I've created #28008 to deal with that separate issue.
New commits:
d00c34a | Disable OpenBLAS AVX512 support since it causes crashes |
Author: Volker Braun
Seems like the safest bet for now. Would be nice to have but I don't most people using Sage are explicitly dependent on such bleeding-edge features, and they are probably building their own openblas if they do.
Reviewer: Erik Bray
Changed keywords from none to days101 openblas
Description changed:
---
+++
@@ -1,15 +1,4 @@
-Two issues with openblas 0.3.6.
-
-1. On at least some OS X machines, after upgrading from openblas 0.3.5, openblas 0.3.6 causes Sage to crash. From the end of Sage_crash_report.txt:
-
-```
-ImportError: dlopen(/Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/python2.7/site-packages/sage/matrix/matrix_rational_dense.so, 2): Library not loaded: /Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/libopenblas_haswellp-r0.3.5.dylib
- Referenced from: /Users/palmieri/Desktop/Sage_stuff/git/sage/local/lib/python2.7/site-packages/sage/matrix/matrix_rational_dense.so
- Reason: image not found
-```
- This failure only occurs when upgrading Sage; it works when building from scratch.
-
-2. On some OS X machines but not all (I've seen it only on an iMac Pro):
+On some OS X machines but not all (I've seen it only on an iMac Pro):
sage -t src/doc/en/thematic_tutorials/explicit_methods_in_number_theory/birds_other.rst # Bad exit: 1
- This occurs when building from scratch, with 8.8.beta7 merged with #27847.
+This occurs when building from scratch, with 8.8.beta7 merged with #27847.
+
+(What used to be issue !#1 on this ticket is now addressed at #28008.)
Changed branch from u/vbraun/openblas_0_3_6_vs__os_x to d00c34a
On some OS X machines but not all (I've seen it only on an iMac Pro):
This occurs when building from scratch, with 8.8.beta7 merged with #27847.
(What used to be issue !#1 on this ticket is now addressed at #28008.)
Component: packages: standard
Keywords: days101 openblas
Author: Volker Braun
Branch/Commit:
d00c34a
Reviewer: Erik Bray
Issue created by migration from https://trac.sagemath.org/ticket/27961