sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

Upgrade ATLAS to version 3.11.38 #19719

Closed jdemeyer closed 3 years ago

jdemeyer commented 8 years ago

Recent versions support the Power8 ppc64le architecture.

Tarball: http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2 (generated by spkg-src)

Due to https://github.com/scipy/scipy/issues/5266, we need to use LAPACK 3.5.0 and not LAPACK 3.6.0.

Upstream bugs:

Random failures:

Upstream: Reported upstream. Developers acknowledge bug.

CC: @vbraun @jpflori @nexttime @dimpase @jhpalmieri

Component: packages: standard

Keywords: BLAS, LAPACK, --with-blas

Branch/Commit: u/jdemeyer/upgrade_atlas @ c19635f

Reviewer: Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/19719

kiwifb commented 8 years ago
comment:1

You need to update to a more recent ATLAS (3.11.something), upstream worked hard to make it build on power8 relatively recently.

kiwifb commented 8 years ago
comment:2

From the horse's mouth: http://ehc.ac/p/math-atlas/mailman/message/34330714/

jdemeyer commented 8 years ago
comment:3

Thanks for the info, I will check it out.

So far, I managed to build all packages except ATLAS and packages depending on it.

jdemeyer commented 8 years ago

Author: Jeroen Demeyer

jdemeyer commented 8 years ago

Description changed:

--- 
+++ 
@@ -1 +1,3 @@
-ATLAS plainly refuses to build on a `ppc64le` system.
+Recent versions support the Power8 ppc64le architecture.
+
+**Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (renamed from upstream's `atlas3.11.38.tar.bz2` without dash)
jdemeyer commented 8 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,3 @@
 Recent versions support the Power8 ppc64le architecture.

-**Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (renamed from upstream's `atlas3.11.38.tar.bz2` without dash)
+**Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (generated by `spkg-src`)
jdemeyer commented 8 years ago

Branch: u/jdemeyer/upgrade_atlas

jdemeyer commented 8 years ago
comment:7

Can anyone get this branch working? On my Linux Core i7 system, I get

./xgmmsearch -p s -f 4
xgmmsearch: /usr/local/src/sage-config/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/blas/gemm/gmmsearch.c:233: FullSrchMUNU: Assertion `mfB > 0
.0' failed.
TIMING BCAST VS SPLAT MVEC WITH: B=(120,120,120) U=(2,4,1)
   BCAST = -nan MFLOP
   SPLAT = -nan MFLOP
VBCAST PROVIDES -nan SPEEDUP
Full search on MUxNU for nb=120, NREG=16, VLEN=4, KVEC=0
   MU= 1, NU= 1, MFLOP=-nan
   MU= 1, NU= 2, MFLOP=-nan
   MU= 1, NU= 3, MFLOP=-nan
   MU= 1, NU= 4, MFLOP=-nan
   MU= 1, NU= 5, MFLOP=-nan
[...]
Makefile:445: recipe for target 'res/snreg' failed
make[6]: *** [res/snreg] Aborted
[...]
ERROR 539 DURING CACHESIZE SEARCH!!.  CHECK INSTALL_LOG/Stage1.log FOR DETAILS.
[...]

New commits:

2b458deUpgrade ATLAS to version 3.11.38
jdemeyer commented 8 years ago

Commit: 2b458de

kiwifb commented 8 years ago
comment:8

Haven't tried the branch yet but I have built it from ebuild on my machine. That being said my first attempt failed in the same way at the exact same point.

It got past once I set the ebuild to use threads which would have added -t -1 -Si omp 0 to the build options. I also had --use-ifko enabled, it took ~3h30mn to build on my 12 cores machine.

jdemeyer commented 8 years ago
comment:9

Replying to @kiwifb:

Haven't tried the branch yet but I have built it from ebuild on my machine.

Do you know exactly which version that was?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

9f29a30Really override ATLAS throttling check
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 2b458de to 9f29a30

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 9f29a30 to c0da5dd

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

c0da5ddFix string overflow in case >= 100 threads
jdemeyer commented 8 years ago

Upstream: Reported upstream. No feedback yet.

jdemeyer commented 8 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,5 @@
 Recent versions support the Power8 ppc64le architecture.

 **Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (generated by `spkg-src`)
+
+**Upstream bug** in case there are >= 100 CPUs: https://sourceforge.net/p/math-atlas/support-requests/1011/
kiwifb commented 8 years ago
comment:13

Replying to @jdemeyer:

Replying to @kiwifb:

Haven't tried the branch yet but I have built it from ebuild on my machine.

Do you know exactly which version that was?

3.11.38 the same one.

How many cores does your machine have?

jdemeyer commented 8 years ago
comment:14

Replying to @kiwifb:

How many cores does your machine have?

Depends how you count... there are 24 cores, but it has multi-threading, so ATLAS detects 192 processors.

jdemeyer commented 8 years ago
comment:15

ATLAS is finally building now...

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

852a874Skip throttling check with patch
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from c0da5dd to 852a874

jdemeyer commented 8 years ago

Description changed:

--- 
+++ 
@@ -2,4 +2,6 @@

 **Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (generated by `spkg-src`)

+Due to [https://github.com/scipy/scipy/issues/5266](https://github.com/scipy/scipy/issues/5266), we need to use LAPACK 3.5.0 and not LAPACK 3.6.0.
+
 **Upstream bug** in case there are >= 100 CPUs: https://sourceforge.net/p/math-atlas/support-requests/1011/
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5c9ac1eDowngrade to LAPACK 3.5.0 for SciPy
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 852a874 to 5c9ac1e

kiwifb commented 8 years ago
comment:19

I see that you have meet scipy's use of lapack deprecated function https://archives.gentoo.org/gentoo-science/message/e5cb5f1117bc956cd829a667918026f4 and after. The good news is there is already a commit removing them from the unreleased 0.17 https://github.com/scipy/scipy/pull/5518 and it can be otherwise solved by telling lapack to build with deprecated functions. According to the Gentoo ebuild after running configure you can do

echo "BUILD_DEPRECATED=1" >> src/lapack/reference/make.inc.example

in your build directory and it will add back the deprecated functions in lapack-3.6.0.


New commits:

5c9ac1eDowngrade to LAPACK 3.5.0 for SciPy
jdemeyer commented 8 years ago
comment:20

Replying to @kiwifb:

I see that you have meet scipy's use of lapack deprecated function

I guess you mean removed functions, despite what LAPACK calls them. If they are not built by default, they are de facto removed, not deprecated.

I am going for the easy way out, which is using LAPACK 3.5.0 and SciPy 0.16.1 clean from upstream.

jdemeyer commented 8 years ago
comment:21

I any case, I needed to make a new ATLAS tarball, because there were some mistakes with the ARCHS directory.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

bcb3dc4Upgrade ATLAS to version 3.11.38
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 5c9ac1e to bcb3dc4

jdemeyer commented 8 years ago
comment:24

ATLAS works fine on my x86_64 laptop and the previous version built fine on POWER8. Unfortunately, the latest version of this branch failed on POWER8. I'm trying again to see if this a reproducible problem.

jdemeyer commented 8 years ago
comment:25

It succeeded the second time (without changes). I don't know how common this failure is or why it happened, but I do have a working ATLAS now.

kiwifb commented 8 years ago
comment:26

That's very curious. It is probably a bug in ATLAS's build system that's a bit random. I am somewhat worried about the randomness. I may have to test on more hardware.

jdemeyer commented 8 years ago
comment:27

I reported the random build failure at http://sourceforge.net/p/math-atlas/support-requests/1013/

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

615875bUpgrade ATLAS to version 3.11.38
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from bcb3dc4 to 615875b

jdemeyer commented 8 years ago
comment:29

Rebased to 7.0.beta0. Volker, can you test this on the buildbot?

vbraun commented 8 years ago

Reviewer: Volker Braun

jdemeyer commented 8 years ago
comment:31

Sage 7.0.beta0 is causing new problems: #19767

vbraun commented 8 years ago
comment:32

On Arando it dies with

make -f Make.top time
make[4]: Entering directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
./xatlbench -dc /home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/bin/INSTALL_LOG -dp 
Error around argument 4 (Out of args)!
USAGE: ./xatlbench [flags]
   -dp <prior benchmark directory>
   -dc <current benchmark directory>
   -f <filename w/o prefix>
   -o <outfile> : default=stdout
make[4]: *** [time] Error 4
make[4]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
make[3]: *** [time] Error 2
make[3]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
The ATLAS timing data failed to be collected.

see http://build.sagedev.org/release/builders/%20%20fast%20Oxford%20arando%20%28Ubuntu%2013.04%20i686%29%20incremental/builds/731/steps/compile/logs/atlas

vbraun commented 8 years ago
comment:33

A different failure on http://build.sagedev.org/release/builders/%20%20slow%20AIMS%20bu14_32s02%20%28Ubuntu%2014.04%2032%20bit%29%20incremental/builds/279/steps/compile/logs/atlas possibly because of

/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/blas/gemm/gmmsearch.c:1:0: warning: SSE instruction set disabled, using 387 arithmetics [enabled by default]
 /*
 ^
./xgmmsearch -p s -f 4
ERROR IN COMMAND: make xsammtime_pt mb=120 nb=120 kb=120 mmrout=ATL_samm120_4m_4x4x1.c mu=4 nu=4 ku=1 mvA=1 mvB=1 mvC=0 kmoves=" -DATL_MOVEA -DATL_MOVEB" beta=1 outF="-f res/tmpout.ktim" > /dev/null 2>&1
   PROPOSED FILENAME: res/tmpout.ktim
   GENSTR='make gen_amm pre=s rt=ATL_samm120_4m_4x4x1.c vec=mdim vlen=4 mu=4 nu=4 ku=1 bcast=1'
TIMING BCAST VS SPLAT MVEC WITH: B=(120,120,120) U=(1,4,1)
make[9]: *** [res/snreg] Error 255
make[9]: Leaving directory `/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemm'
make[8]: *** [res/snreg] Error 2
make[8]: Leaving directory `/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/sysinfo'
xsyssum: /mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/sysinfo/GetSysSum.c:129: getmmnreg: Assertion `system(fnam) == 0' failed.
vbraun commented 8 years ago
comment:34

On the Ubuntu 15.10 64-bit machine ATLAS builds but then runs into an assertion in the Sage doctests:

sage: import sage.matrix.benchmark as b ## line 724 ##
sage: ts = b.matrix_multiply_GF(100, p=19) ## line 725 ##
assertion K > 2 && K <= 96 failed, line 23 of file /mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//src/blas/ammm/ATL_GetRankKInfo.c

See http://build.sagedev.org/release/builders/%20%20slow%20AIMS%20bu1510_64s02%20%28Ubuntu%2015.10%2064%20bit%29%20incremental/builds/12/steps/shell_4/logs/stdio

jdemeyer commented 8 years ago
comment:35

Replying to @vbraun:

On Arando it dies with

make -f Make.top time
make[4]: Entering directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
./xatlbench -dc /home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/bin/INSTALL_LOG -dp 
Error around argument 4 (Out of args)!
USAGE: ./xatlbench [flags]
   -dp <prior benchmark directory>
   -dc <current benchmark directory>
   -f <filename w/o prefix>
   -o <outfile> : default=stdout
make[4]: *** [time] Error 4
make[4]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
make[3]: *** [time] Error 2
make[3]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
The ATLAS timing data failed to be collected.

see http://build.sagedev.org/release/builders/%20%20fast%20Oxford%20arando%20%28Ubuntu%2013.04%20i686%29%20incremental/builds/731/steps/compile/logs/atlas

That's not actually considered to be an error (because I made it so).

jdemeyer commented 8 years ago
comment:36

Replying to @vbraun:

A different failure on http://build.sagedev.org/release/builders/%20%20slow%20AIMS%20bu14_32s02%20%28Ubuntu%2014.04%2032%20bit%29%20incremental/builds/279/steps/compile/logs/atlas possibly because of

/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/blas/gemm/gmmsearch.c:1:0: warning: SSE instruction set disabled, using 387 arithmetics [enabled by default]
 /*
 ^
./xgmmsearch -p s -f 4
ERROR IN COMMAND: make xsammtime_pt mb=120 nb=120 kb=120 mmrout=ATL_samm120_4m_4x4x1.c mu=4 nu=4 ku=1 mvA=1 mvB=1 mvC=0 kmoves=" -DATL_MOVEA -DATL_MOVEB" beta=1 outF="-f res/tmpout.ktim" > /dev/null 2>&1
   PROPOSED FILENAME: res/tmpout.ktim
   GENSTR='make gen_amm pre=s rt=ATL_samm120_4m_4x4x1.c vec=mdim vlen=4 mu=4 nu=4 ku=1 bcast=1'
TIMING BCAST VS SPLAT MVEC WITH: B=(120,120,120) U=(1,4,1)
make[9]: *** [res/snreg] Error 255
make[9]: Leaving directory `/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemm'
make[8]: *** [res/snreg] Error 2
make[8]: Leaving directory `/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/sysinfo'
xsyssum: /mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/sysinfo/GetSysSum.c:129: getmmnreg: Assertion `system(fnam) == 0' failed.

For this one, could you try a different GCC version?

jdemeyer commented 8 years ago
comment:37

Replying to @vbraun:

On the Ubuntu 15.10 64-bit machine ATLAS builds but then runs into an assertion in the Sage doctests:

sage: import sage.matrix.benchmark as b ## line 724 ##
sage: ts = b.matrix_multiply_GF(100, p=19) ## line 725 ##
assertion K > 2 && K <= 96 failed, line 23 of file /mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//src/blas/ammm/ATL_GetRankKInfo.c

See http://build.sagedev.org/release/builders/%20%20slow%20AIMS%20bu1510_64s02%20%28Ubuntu%2015.10%2064%20bit%29%20incremental/builds/12/steps/shell_4/logs/stdio

Hmm, for this one a traceback would be nice.

vbraun commented 8 years ago
comment:38

You do have an account on the AIMS buildbot slaves, right?

jdemeyer commented 8 years ago
comment:39

I forgot, but it seems I do.

jdemeyer commented 8 years ago
comment:40

On ppc64le, I built this successfully (that's 2 out of 3) on 7.0.beta0 after adding an LD_LIBRARY_PATH workaround for #19767.

jdemeyer commented 8 years ago
comment:43

On my ppc64le machine, the score is now 2 successful builds out of 4 tries. The last time it failed with:

if [ -s "/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/src/blas/gemv/Make_dmvn" ]; then \
           cd  /home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/src/blas/gemv ; make -j1 -f Make_dmvn killall ; \
           rm -f /home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/src/blas/gemv/Make_dmvn ; \
        fi
./xmvnhgen -p d -F res/dMVNK.sum -d dmvnoutd
xmvnhgen: /home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//include/atlas_genparse.h:239: GetDoubleArr: Assertion `sscanf(str, "%le",
 d+i) == 1' failed.
Makefile:682: recipe for target 'dmvninstall' failed
make[7]: *** [dmvninstall] Aborted
make[7]: Leaving directory '/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemv'
Makefile:422: recipe for target 'res/dMVNK.sum' failed
make[6]: *** [res/dMVNK.sum] Error 2
make[6]: Leaving directory '/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemv'
Makefile:334: recipe for target '/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemv/res/dMVNK.sum' failed
make[5]: *** [/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemv/res/dMVNK.sum] Error 2
make[5]: Leaving directory '/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/bin'
ERROR 626 DURING MVNTUNE!!.  CHECK INSTALL_LOG/dMVNTUNE.LOG FOR DETAILS.

I'm afraid that this new ATLAS is full of random failures...

vbraun commented 8 years ago
comment:44

My experience with atlas unstable releases was similar...