Closed jdemeyer closed 3 years ago
You need to update to a more recent ATLAS (3.11.something), upstream worked hard to make it build on power8 relatively recently.
From the horse's mouth: http://ehc.ac/p/math-atlas/mailman/message/34330714/
Thanks for the info, I will check it out.
So far, I managed to build all packages except ATLAS and packages depending on it.
Author: Jeroen Demeyer
Description changed:
---
+++
@@ -1 +1,3 @@
-ATLAS plainly refuses to build on a `ppc64le` system.
+Recent versions support the Power8 ppc64le architecture.
+
+**Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (renamed from upstream's `atlas3.11.38.tar.bz2` without dash)
Description changed:
---
+++
@@ -1,3 +1,3 @@
Recent versions support the Power8 ppc64le architecture.
-**Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (renamed from upstream's `atlas3.11.38.tar.bz2` without dash)
+**Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (generated by `spkg-src`)
Branch: u/jdemeyer/upgrade_atlas
Can anyone get this branch working? On my Linux Core i7 system, I get
./xgmmsearch -p s -f 4
xgmmsearch: /usr/local/src/sage-config/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/blas/gemm/gmmsearch.c:233: FullSrchMUNU: Assertion `mfB > 0
.0' failed.
TIMING BCAST VS SPLAT MVEC WITH: B=(120,120,120) U=(2,4,1)
BCAST = -nan MFLOP
SPLAT = -nan MFLOP
VBCAST PROVIDES -nan SPEEDUP
Full search on MUxNU for nb=120, NREG=16, VLEN=4, KVEC=0
MU= 1, NU= 1, MFLOP=-nan
MU= 1, NU= 2, MFLOP=-nan
MU= 1, NU= 3, MFLOP=-nan
MU= 1, NU= 4, MFLOP=-nan
MU= 1, NU= 5, MFLOP=-nan
[...]
Makefile:445: recipe for target 'res/snreg' failed
make[6]: *** [res/snreg] Aborted
[...]
ERROR 539 DURING CACHESIZE SEARCH!!. CHECK INSTALL_LOG/Stage1.log FOR DETAILS.
[...]
New commits:
2b458de | Upgrade ATLAS to version 3.11.38 |
Haven't tried the branch yet but I have built it from ebuild on my machine. That being said my first attempt failed in the same way at the exact same point.
It got past once I set the ebuild to use threads which would have added -t -1 -Si omp 0
to the build options. I also had --use-ifko
enabled, it took ~3h30mn to build on my 12 cores machine.
Replying to @kiwifb:
Haven't tried the branch yet but I have built it from ebuild on my machine.
Do you know exactly which version that was?
Branch pushed to git repo; I updated commit sha1. New commits:
9f29a30 | Really override ATLAS throttling check |
Branch pushed to git repo; I updated commit sha1. New commits:
c0da5dd | Fix string overflow in case >= 100 threads |
Upstream: Reported upstream. No feedback yet.
Description changed:
---
+++
@@ -1,3 +1,5 @@
Recent versions support the Power8 ppc64le architecture.
**Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (generated by `spkg-src`)
+
+**Upstream bug** in case there are >= 100 CPUs: https://sourceforge.net/p/math-atlas/support-requests/1011/
Replying to @kiwifb:
How many cores does your machine have?
Depends how you count... there are 24 cores, but it has multi-threading, so ATLAS detects 192 processors.
ATLAS is finally building now...
Branch pushed to git repo; I updated commit sha1. New commits:
852a874 | Skip throttling check with patch |
Description changed:
---
+++
@@ -2,4 +2,6 @@
**Tarball**: [http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2](http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2) (generated by `spkg-src`)
+Due to [https://github.com/scipy/scipy/issues/5266](https://github.com/scipy/scipy/issues/5266), we need to use LAPACK 3.5.0 and not LAPACK 3.6.0.
+
**Upstream bug** in case there are >= 100 CPUs: https://sourceforge.net/p/math-atlas/support-requests/1011/
Branch pushed to git repo; I updated commit sha1. New commits:
5c9ac1e | Downgrade to LAPACK 3.5.0 for SciPy |
I see that you have meet scipy
's use of lapack
deprecated function https://archives.gentoo.org/gentoo-science/message/e5cb5f1117bc956cd829a667918026f4 and after. The good news is there is already a commit removing them from the unreleased 0.17
https://github.com/scipy/scipy/pull/5518 and it can be otherwise solved by telling lapack
to build with deprecated functions. According to the Gentoo ebuild after running configure
you can do
echo "BUILD_DEPRECATED=1" >> src/lapack/reference/make.inc.example
in your build directory and it will add back the deprecated functions in lapack-3.6.0
.
New commits:
5c9ac1e | Downgrade to LAPACK 3.5.0 for SciPy |
Replying to @kiwifb:
I see that you have meet
scipy
's use oflapack
deprecated function
I guess you mean removed functions, despite what LAPACK calls them. If they are not built by default, they are de facto removed, not deprecated.
I am going for the easy way out, which is using LAPACK 3.5.0 and SciPy 0.16.1 clean from upstream.
I any case, I needed to make a new ATLAS tarball, because there were some mistakes with the ARCHS
directory.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
bcb3dc4 | Upgrade ATLAS to version 3.11.38 |
ATLAS works fine on my x86_64
laptop and the previous version built fine on POWER8. Unfortunately, the latest version of this branch failed on POWER8. I'm trying again to see if this a reproducible problem.
It succeeded the second time (without changes). I don't know how common this failure is or why it happened, but I do have a working ATLAS now.
That's very curious. It is probably a bug in ATLAS
's build system that's a bit random. I am somewhat worried about the randomness. I may have to test on more hardware.
I reported the random build failure at http://sourceforge.net/p/math-atlas/support-requests/1013/
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
615875b | Upgrade ATLAS to version 3.11.38 |
Rebased to 7.0.beta0. Volker, can you test this on the buildbot?
Reviewer: Volker Braun
Sage 7.0.beta0 is causing new problems: #19767
On Arando it dies with
make -f Make.top time
make[4]: Entering directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
./xatlbench -dc /home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/bin/INSTALL_LOG -dp
Error around argument 4 (Out of args)!
USAGE: ./xatlbench [flags]
-dp <prior benchmark directory>
-dc <current benchmark directory>
-f <filename w/o prefix>
-o <outfile> : default=stdout
make[4]: *** [time] Error 4
make[4]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
make[3]: *** [time] Error 2
make[3]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build'
The ATLAS timing data failed to be collected.
A different failure on http://build.sagedev.org/release/builders/%20%20slow%20AIMS%20bu14_32s02%20%28Ubuntu%2014.04%2032%20bit%29%20incremental/builds/279/steps/compile/logs/atlas possibly because of
/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/blas/gemm/gmmsearch.c:1:0: warning: SSE instruction set disabled, using 387 arithmetics [enabled by default]
/*
^
./xgmmsearch -p s -f 4
ERROR IN COMMAND: make xsammtime_pt mb=120 nb=120 kb=120 mmrout=ATL_samm120_4m_4x4x1.c mu=4 nu=4 ku=1 mvA=1 mvB=1 mvC=0 kmoves=" -DATL_MOVEA -DATL_MOVEB" beta=1 outF="-f res/tmpout.ktim" > /dev/null 2>&1
PROPOSED FILENAME: res/tmpout.ktim
GENSTR='make gen_amm pre=s rt=ATL_samm120_4m_4x4x1.c vec=mdim vlen=4 mu=4 nu=4 ku=1 bcast=1'
TIMING BCAST VS SPLAT MVEC WITH: B=(120,120,120) U=(1,4,1)
make[9]: *** [res/snreg] Error 255
make[9]: Leaving directory `/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemm'
make[8]: *** [res/snreg] Error 2
make[8]: Leaving directory `/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/sysinfo'
xsyssum: /mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/sysinfo/GetSysSum.c:129: getmmnreg: Assertion `system(fnam) == 0' failed.
On the Ubuntu 15.10 64-bit machine ATLAS builds but then runs into an assertion in the Sage doctests:
sage: import sage.matrix.benchmark as b ## line 724 ##
sage: ts = b.matrix_multiply_GF(100, p=19) ## line 725 ##
assertion K > 2 && K <= 96 failed, line 23 of file /mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//src/blas/ammm/ATL_GetRankKInfo.c
Replying to @vbraun:
On Arando it dies with
make -f Make.top time make[4]: Entering directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build' ./xatlbench -dc /home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/bin/INSTALL_LOG -dp Error around argument 4 (Out of args)! USAGE: ./xatlbench [flags] -dp <prior benchmark directory> -dc <current benchmark directory> -f <filename w/o prefix> -o <outfile> : default=stdout make[4]: *** [time] Error 4 make[4]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build' make[3]: *** [time] Error 2 make[3]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build' The ATLAS timing data failed to be collected.
That's not actually considered to be an error (because I made it so).
Replying to @vbraun:
A different failure on http://build.sagedev.org/release/builders/%20%20slow%20AIMS%20bu14_32s02%20%28Ubuntu%2014.04%2032%20bit%29%20incremental/builds/279/steps/compile/logs/atlas possibly because of
/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/blas/gemm/gmmsearch.c:1:0: warning: SSE instruction set disabled, using 387 arithmetics [enabled by default] /* ^ ./xgmmsearch -p s -f 4 ERROR IN COMMAND: make xsammtime_pt mb=120 nb=120 kb=120 mmrout=ATL_samm120_4m_4x4x1.c mu=4 nu=4 ku=1 mvA=1 mvB=1 mvC=0 kmoves=" -DATL_MOVEA -DATL_MOVEB" beta=1 outF="-f res/tmpout.ktim" > /dev/null 2>&1 PROPOSED FILENAME: res/tmpout.ktim GENSTR='make gen_amm pre=s rt=ATL_samm120_4m_4x4x1.c vec=mdim vlen=4 mu=4 nu=4 ku=1 bcast=1' TIMING BCAST VS SPLAT MVEC WITH: B=(120,120,120) U=(1,4,1) make[9]: *** [res/snreg] Error 255 make[9]: Leaving directory `/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemm' make[8]: *** [res/snreg] Error 2 make[8]: Leaving directory `/mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/sysinfo' xsyssum: /mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//tune/sysinfo/GetSysSum.c:129: getmmnreg: Assertion `system(fnam) == 0' failed.
For this one, could you try a different GCC version?
Replying to @vbraun:
On the Ubuntu 15.10 64-bit machine ATLAS builds but then runs into an assertion in the Sage doctests:
sage: import sage.matrix.benchmark as b ## line 724 ## sage: ts = b.matrix_multiply_GF(100, p=19) ## line 725 ## assertion K > 2 && K <= 96 failed, line 23 of file /mnt/highperf/buildbot/slave/sage_git/build/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//src/blas/ammm/ATL_GetRankKInfo.c
Hmm, for this one a traceback would be nice.
You do have an account on the AIMS buildbot slaves, right?
I forgot, but it seems I do.
On ppc64le, I built this successfully (that's 2 out of 3) on 7.0.beta0 after adding an LD_LIBRARY_PATH
workaround for #19767.
On my ppc64le machine, the score is now 2 successful builds out of 4 tries. The last time it failed with:
if [ -s "/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/src/blas/gemv/Make_dmvn" ]; then \
cd /home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/src/blas/gemv ; make -j1 -f Make_dmvn killall ; \
rm -f /home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/src/blas/gemv/Make_dmvn ; \
fi
./xmvnhgen -p d -F res/dMVNK.sum -d dmvnoutd
xmvnhgen: /home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/../ATLAS//include/atlas_genparse.h:239: GetDoubleArr: Assertion `sscanf(str, "%le",
d+i) == 1' failed.
Makefile:682: recipe for target 'dmvninstall' failed
make[7]: *** [dmvninstall] Aborted
make[7]: Leaving directory '/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemv'
Makefile:422: recipe for target 'res/dMVNK.sum' failed
make[6]: *** [res/dMVNK.sum] Error 2
make[6]: Leaving directory '/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemv'
Makefile:334: recipe for target '/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemv/res/dMVNK.sum' failed
make[5]: *** [/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/tune/blas/gemv/res/dMVNK.sum] Error 2
make[5]: Leaving directory '/home/jdemeyer/sage/local/var/tmp/sage/build/atlas-3.11.38/src/ATLAS-build/bin'
ERROR 626 DURING MVNTUNE!!. CHECK INSTALL_LOG/dMVNTUNE.LOG FOR DETAILS.
I'm afraid that this new ATLAS is full of random failures...
My experience with atlas unstable releases was similar...
Recent versions support the Power8 ppc64le architecture.
Tarball: http://sage.ugent.be/www/jdemeyer/sage/atlas-3.11.38.tar.bz2 (generated by
spkg-src
)Due to https://github.com/scipy/scipy/issues/5266, we need to use LAPACK 3.5.0 and not LAPACK 3.6.0.
Upstream bugs:
Random failures:
Upstream: Reported upstream. Developers acknowledge bug.
CC: @vbraun @jpflori @nexttime @dimpase @jhpalmieri
Component: packages: standard
Keywords: BLAS, LAPACK, --with-blas
Branch/Commit: u/jdemeyer/upgrade_atlas @
c19635f
Reviewer: Dima Pasechnik
Issue created by migration from https://trac.sagemath.org/ticket/19719