rioyokotalab / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Other
2 stars 0 forks source link

[Reedbush] ATLAS setup error #7

Closed Hiroki11x closed 7 years ago

Hiroki11x commented 7 years ago
wget http://www.netlib.org/lapack/lapack-3.6.1.tgz
wget https://sourceforge.net/projects/math-atlas/files/Stable/3.10.3/atlas3.10.3.tar.bz2
tar xjvf atlas3.10.3.tar.bz2
cd ATLAS
mkdir build
cd build
../configure -b 64 --prefix=$LOCAL_DIR/ATLAS --shared --with-netlib-lapack-tarfile=../../lapack-3.6.1.tgz
make -j $J
make install
make -j 32

the following error occured

make[10]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make[9]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make[8]: *** [tstlib.grd] Error 2
make[8]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/tune/blas/level1'
TST: make drottest urout=rot1_x1y1.c opt="" 
make[8]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build/tune/blas/level1'
cd /home/gi75/i75012/env/src/ATLAS/build/src/testing ; make lib
make[8]: *** read jobs pipe EOF.  Stop.
make[8]: *** Waiting for unfinished jobs....
make[9]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make -j 28 dlib.grd
make[9]: *** read jobs pipe EOF.  Stop.
make[9]: *** Waiting for unfinished jobs....
make[10]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make[10]: warning: -jN forced in submake: disabling jobserver mode.
make[10]: `dlib.grd' is up to date.
make[10]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make[9]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make[8]: *** [tstlib.grd] Error 2
make[8]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/tune/blas/level1'
TST: make drottest urout=rot4_x1y1.c opt="" 
make[8]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build/tune/blas/level1'
cd /home/gi75/i75012/env/src/ATLAS/build/src/testing ; make lib
make[8]: *** read jobs pipe EOF.  Stop.
make[8]: *** Waiting for unfinished jobs....
make[9]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make -j 28 dlib.grd
make[9]: *** read jobs pipe EOF.  Stop.
make[9]: *** Waiting for unfinished jobs....
make[10]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make[10]: warning: -jN forced in submake: disabling jobserver mode.
make[10]: `dlib.grd' is up to date.
make[10]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make[9]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/testing'
make[8]: *** [tstlib.grd] Error 2
make[8]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/tune/blas/level1'
NO GENERAL CASE SURVIVED!!  ABORTING!!
  ID  incX  incY  alpha  beta  ROUT
====  ====  ====  =====  ====  =============
   1     0     0     2     2  rot1_x0y0.c
   2     1     1     2     2  rot1_x1y1.c
   3     1     1     2     2  rot4_x1y1.c

make[7]: *** [dinstall_rot] Error 255
make[7]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/tune/blas/level1'
make[6]: *** [Make_drot] Error 2
make[6]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/blas/level1'
make[5]: *** [dgen] Error 2
make[5]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/blas/level1'
make[4]: *** [dlib] Error 2
make[4]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/blas/level1'
make[3]: *** [lib.grd] Error 2
make[3]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/src/auxil'
make[2]: *** [IStage1] Error 2
make[2]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/bin'
ERROR 712 DURING CACHESIZE SEARCH!!.  CHECK INSTALL_LOG/Stage1.log FOR DETAILS.
make[2]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build/bin'
cd /home/gi75/i75012/env/src/ATLAS/build ; make error_report
make[3]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build'
make -f Make.top error_report
make[4]: Entering directory `/home/gi75/i75012/env/src/ATLAS/build'
uname -a 2>&1 >> bin/INSTALL_LOG/ERROR.LOG
/usr/bin/x86_64-redhat-linux-gcc -v 2>&1  >> bin/INSTALL_LOG/ERROR.LOG
Using built-in specs.
COLLECT_GCC=/usr/bin/x86_64-redhat-linux-gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) 
/usr/bin/x86_64-redhat-linux-gcc -V 2>&1  >> bin/INSTALL_LOG/ERROR.LOG
x86_64-redhat-linux-gcc: error: unrecognized command line option ‘-V’
x86_64-redhat-linux-gcc: fatal error: no input files
compilation terminated.
make[4]: [error_report] Error 4 (ignored)
/usr/bin/x86_64-redhat-linux-gcc --version 2>&1  >> bin/INSTALL_LOG/ERROR.LOG
tar cf error_UNKNOWNx8664AVXMAC.tar Make.inc bin/INSTALL_LOG/*
bzip2 error_UNKNOWNx8664AVXMAC.tar
make[4]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build'
make[3]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build'
make[2]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build/bin'
Error report error_<ARCH>.tgz has been created in your top-level ATLAS
directory.  Be sure to include this file in any help request.
cat: ../../CONFIG/error.txt: No such file or directory
cat: ../../CONFIG/error.txt: No such file or directory
make[1]: *** [build] Error 255
make[1]: Leaving directory `/home/gi75/i75012/env/src/ATLAS/build'
make: *** [build] Error 2
Hiroki11x commented 7 years ago

Do NOT build ATLAS using paralell (-j) make! http://math-atlas.sourceforge.net/errata.html

Hiroki11x commented 7 years ago

http://unity-memo.hatenablog.com/entry/2015/03/03/104238

Hiroki11x commented 7 years ago

途中で落ちた

BEGIN BASIC KERNEL TESTS:
   Kernel ATL_ger2k_1x1_1.c(1) passes basic test
   NUKING bad kernel ATL_sger2K_NEON_lda4.S(2), MU=8, NU=4
   NUKING bad kernel ATL_sger2K_NEON.S(3), MU=8, NU=4
DONE BASIC KERNEL TESTS:

Surviving cases:
ID=1 ROUT='ATL_ger2k_1x1_1.c' AUTH='R. Clint Whaley' \
   rankR=0 CacheElts=0 SSE=0 alignA=0 alignY=0 alignX=0 minM=0 minN=0 NU=1  \
   MU=1 LDAMUL=0 PFTUNABLE=0 ALIGNX2A=0 ADDCFLAGS=0 FNU=0 INCYISONE=0 X87=0 
BEGIN NU/MU EXTRACT SEARCH, imf=1:

BEGIN BASIC KERNEL TESTS:
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
   Kernel sr2_C.c(900000) passes basic test
DONE BASIC KERNEL TESTS:

   900000:sr2_C.c (M=3000, N=2000, lda=3003) gets 3992.68 MFLOPS
   900000:sr2_C.c (M=3000, N=2000, lda=3003) gets 4467.52 MFLOPS
   900000:sr2_C.c (M=3000, N=2000, lda=3003) gets 4804.64 MFLOPS
   900000:sr2_C.c (M=3000, N=2000, lda=3003) gets 5132.35 MFLOPS
   900000:sr2_C.c (M=3000, N=1992, lda=3003) gets 4747.41 MFLOPS
   900000:sr2_C.c (M=3000, N=2000, lda=3003) gets 4386.70 MFLOPS
   900000:sr2_C.c (M=3000, N=2000, lda=3003) gets 4077.46 MFLOPS
   900000:sr2_C.c (M=3000, N=2000, lda=3003) gets 2699.81 MFLOPS
make[3]: *** [res/sR2K.sum] Error 255
make[3]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build/tune/blas/ger'
make[2]: *** [/lustre/gi75/i75012/env/src/ATLAS/build/tune/blas/ger/res/sR2K.sum] Error 2
make[2]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build/bin'
ERROR 1075 DURING R1TUNE!!.  CHECK INSTALL_LOG/sR1TUNE.LOG FOR DETAILS.
make[2]: Entering directory `/lustre/gi75/i75012/env/src/ATLAS/build/bin'
cd /lustre/gi75/i75012/env/src/ATLAS/build ; make error_report
Connection to reedbush.cc.u-tokyo.ac.jp closed.
Hiroki11x commented 7 years ago
$ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1030571
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Hiroki11x commented 7 years ago

it can be solved by using unset autologout http://www.itmedia.co.jp/help/tips/linux/l0150.html

Hiroki11x commented 7 years ago
    A   768    16     6      12275.53
    A   832    16     6      14277.75
    A   896    16     6      14451.73
    A   960    16     6      11894.27
    A     0    16     6      14053.97
BEST imf=2 PFADIST=704 (14723.75)

TUNING PREFETCH DISTANCE FOR OPERAND 'X', imf=2:
   OP  DIST    MU    NU         MFLOP
   ==  ====  ====  ====  ============
    X   DEF    16     6      11952.28
***** Auto-logout WARNING MESSAGE ( 13:30:01 Tue 01 Aug. 2017 ) *****
Your session pts/1 (i75012@reedbush-u3) has been expired. 
Your session and processes have been killed by the system administrator. 
    X   make[3]: *** [res/cR1K.sum] Killed
make[3]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build/tune/blas/ger'
make[2]: *** [/lustre/gi75/i75012/env/src/ATLAS/build/tune/blas/ger/res/cR1K.sum] Error 2
make[2]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build/bin'
sh: line 1: 30484 Done(2)                 make -f Makefile INSTALL_LOG/cR2K.sum pre=c 2>&1
     30485 Killed                  | ./xatlas_tee INSTALL_LOG/cR1TUNE.LOG
ERROR 1074 DURING R1TUNE!!.  CHECK INSTALL_LOG/cR1TUNE.LOG FOR DETAILS.
/bin/sh: line 1: 23363 Killed                  ./xatlas_build -1 0 -a 1 -l 1
make[1]: *** [build] Error 137
make[1]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build'
make: *** [build] Error 2
Hiroki11x commented 7 years ago

ATLAS document says "Don't use make -j"

so , I use

$ make 

Reedbush autologout

then, I continued

$ cd /path-to/src/ATLAS/build
$ make -j 256

Output is below

make[4]: warning: -jN forced in submake: disabling jobserver mode.
make[4]: Nothing to be done for `ctlib'.
make[4]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build/src/threads/lapack'
make[3]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build/src/lapack'
make[2]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build/bin'
   DONE  STAGE 5-1-0 at 09:44

ATLAS install complete.  Examine 
ATLAS/bin/<arch>/INSTALL_LOG/SUMMARY.LOG for details.
make[1]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build'
make clean
make[1]: Entering directory `/lustre/gi75/i75012/env/src/ATLAS/build'
rm -f *.o x* config?.out *core*
make[1]: Leaving directory `/lustre/gi75/i75012/env/src/ATLAS/build'

Suceeded!

$ make install

ATLAS Install succeeded!!

Hiroki11x commented 7 years ago

It is required to build caffe, to compile ATLAS using --shared option

Hiroki11x commented 7 years ago

the same method is effective.