t-sakashita / rokko

Integrated Interface for libraries of eigenvalue decomposition
Boost Software License 1.0
10 stars 2 forks source link

ブロックサイズを指定するparallel denseがseg fault #539

Open t-sakashita opened 4 years ago

t-sakashita commented 4 years ago

メインプログラム

ルーチン

行列サイズ = 10

プロセス数 = 1

mpirun -np 1 --oversubscribe xterm -e lldb -o run ./minij_block_size_mpi scalapack:pdsyevd 10

Macでのデバッガの出力

(lldb) target create "./minij_block_size_mpi"
Current executable set to './minij_block_size_mpi' (x86_64).
(lldb) settings set -- target.run-args  "scalapack:pdsyevd" "10"
(lldb) run
Eigenvalue decomposition of minij matrix
library = scalapack
routine = pdsyevd
dimension = 10
block_size = 64
Process 7442 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010c5f2282 libgfortran.4.dylib`___lldb_unnamed_symbol235$$libgfortran.4.dylib + 70
libgfortran.4.dylib`___lldb_unnamed_symbol235$$libgfortran.4.dylib:
->  0x10c5f2282 <+70>: movsbl (%r14,%rdx), %ecx
    0x10c5f2287 <+75>: incq   %rdx
    0x10c5f228a <+78>: xorl   %ecx, %eax
    0x10c5f228c <+80>: jmp    0x10c5f227d               ; <+65>
Target 0: (minij_block_size_mpi) stopped.

Process 7442 launched: './minij_block_size_mpi' (x86_64)
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000010c5f2282 libgfortran.4.dylib`___lldb_unnamed_symbol235$$libgfortran.4.dylib + 70
    frame #1: 0x000000010c5fd6aa libgfortran.4.dylib`___lldb_unnamed_symbol327$$libgfortran.4.dylib + 567
    frame #2: 0x0000000100bc47c0 libscalapack.dylib`pxerbla_ + 129
    frame #3: 0x0000000100df4cc0 libscalapack.dylib`pdormtr_ + 2061
    frame #4: 0x0000000100e044a0 libscalapack.dylib`pdsyevd_ + 2106
    frame #5: 0x00000001002493ae librokko.dylib`cscalapack_pdsyevd_work(jobz='V', uplo='U', n=10, A=0x00000001160253c0, ia=0, ja=0, descA=0x00007ffeefbfe868, w=0x0000000116020da0, Z=0x00000001160256e0, iz=0, jz=0, descZ=0x00007ffeefbfe868, work=0x0000000116804400, lwork=754, iwork=0x0000000116025a00, liwork=80) at pdsyevd_work.c:24
    frame #6: 0x0000000100249694 librokko.dylib`cscalapack_pdsyevd(jobz='V', uplo='U', n=10, A=0x00000001160253c0, ia=0, ja=0, descA=0x00007ffeefbfe868, w=0x0000000116020da0, Z=0x00000001160256e0, iz=0, jz=0, descZ=0x00007ffeefbfe868) at pdsyevd.c:37
    frame #7: 0x0000000100254e16 librokko.dylib`rokko::parameters rokko::scalapack::diagonalize_pdsyevd<rokko::matrix_col_major, Eigen::Matrix<double, -1, 1, 0, -1, 1> >(mat=0x00007ffeefbfe7f8, eigvals=0x00007ffeefbfe7e8, eigvecs=0x00007ffeefbfe740, params=0x00007ffeefbfe728) at diagonalize_pdsyevd.hpp:36
    frame #8: 0x0000000100253be7 librokko.dylib`rokko::parameters rokko::scalapack::solver::diagonalize<rokko::matrix_col_major, Eigen::Matrix<double, -1, 1, 0, -1, 1> >(this=0x0000000116022e80, mat=0x00007ffeefbfe7f8, eigvals=0x00007ffeefbfe7e8, eigvecs=0x00007ffeefbfe740, params=0x00007ffeefbfe728) at solver.hpp:71
    frame #9: 0x000000010024ce6b librokko.dylib`rokko::detail::pd_ev_wrapper<rokko::scalapack::solver>::diagonalize(this=0x0000000116022e78, mat=0x00007ffeefbfe7f8, eigvals=0x00007ffeefbfe7e8, eigvecs=0x00007ffeefbfe740, params=0x00007ffeefbfe728) at parallel_dense_ev.hpp:89
    frame #10: 0x0000000100011ddb minij_block_size_mpi`rokko::parameters rokko::parallel_dense_ev::diagonalize<rokko::matrix_col_major, Eigen::Matrix<double, -1, 1, 0, -1, 1> >(this=0x00007ffeefbfe940, mat=0x00007ffeefbfe7f8, eigvals=0x00007ffeefbfe7e8, eigvecs=0x00007ffeefbfe740, params=0x00007ffeefbfe728) at parallel_dense_ev.hpp:155
    frame #11: 0x00000001000112a7 minij_block_size_mpi`main(argc=3, argv=0x00007ffeefbfecc8) at minij_block_size_mpi.cpp:66
    frame #12: 0x00007fff7e4ec015 libdyld.dylib`start + 1
    frame #13: 0x00007fff7e4ec015 libdyld.dylib`start + 1

enagaでのエラー

sgimptのScaLAPACKが出したエラー

Eigenvalue decomposition of minij matrix
library = scalapack
routine = pdsyevd
dimension = 10
block_size = 64
{    1,    0}:  On entry to 
{    1,    1}:  On entry to 
{    1,    3}:  On entry to 
PDSYEVD parameter number   14 had an illegal value 
PDSYEVD parameter number   14 had an illegal value 
{    1,    2}:  On entry to 
PDSYEVD parameter number   14 had an illegal value 
PDSYEVD parameter number   14 had an illegal value 
{    3,    0}:  On entry to 
{    3,    1}:  On entry to 
{    3,    3}:  On entry to 
PDSYEVD parameter number   14 had an illegal value 
PDSYEVD parameter number   14 had an illegal value 
{    3,    2}:  On entry to 
PDSYEVD parameter number   14 had an illegal value 
PDSYEVD parameter number   14 had an illegal value 
{    2,    0}:  On entry to 
{    2,    1}:  On entry to 
{    2,    3}:  On entry to 
PDSYEVD parameter number   14 had an illegal value 
PDSYEVD parameter number   14 had an illegal value 
{    0,    2}:  On entry to 
{    2,    2}:  On entry to 
{    0,    0}:  On entry to 
PDSYEVD parameter number   14 had an illegal value 
{    0,    1}:  On entry to 
PDSYEVD parameter number   14 had an illegal value 
PDSYEVD parameter number   14 had an illegal value 
{    0,    3}:  On entry to 
PDSYEVD parameter number   14 had an illegal value 
PDSYEVD parameter number   14 had an illegal value 
PDSYEVD parameter number   14 had an illegal value 

14番目の引数はLWORK 行列サイズがブロックサイズよりも小さい場合の、LWORKの計算法を調べる。

t-sakashita commented 4 years ago
            TRILWMIN = 3*N + MAX( NB*( NP+1 ), 3*NB )
            LWMIN = MAX( 1+6*N+2*NP*NQ, TRILWMIN ) + 2*N
            ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN
               INFO = -14

http://www.netlib.org/scalapack/explore-html/d6/d75/pdsyevd_8f_source.html

t-sakashita commented 4 years ago

Macにおいて、出力してみた。

  int np = 10;
  int nq = 64;
  int TRILWMIN = 3*n + MAX( 16*( np+1 ), 3*16 );
  int LWMIN = MAX( 1+6*n+2*np*nq, TRILWMIN ) + 2*n;
  printf("lwork=%d  lwmin=%d\n", lwork, LWMIN);
  lwork = LWMIN;
m_local=10 n_local=10 lld=10
lwork=754  LWMIN=8273

このLWMINをlworkとしてpdsyevdを呼び出すと、正常終了した。

t-sakashita commented 4 years ago

a6168424d8586056f5cbbff58ec366e3442ab510

t-sakashita commented 4 years ago
mpirun -np 1 --oversubscribe ./minij_mpi scalapack:pdsyevd 10
Eigenvalue decomposition of minij matrix
library:routine = scalapack:pdsyevd
num_procs = 1
num_threads per process = 4
routine = pdsyevd
dimension = 10
{    0,    0}:  On entry to PDORMTR parameter number   16 had an illegal value
largest eigenvalues: 44.766 5.0489 1.873 1 0.6431 0.46523 0.36621 0.30798 0.27379 0.25568
eigenvectors:
    0.61864    -0.68391    -0.37025    -0.11031   -0.017417  -0.0012772 -3.3195e-05  1.7211e-07 -3.9385e-11  3.5421e-19
    0.52739    0.036235     0.65267     0.51565     0.16766    0.023656   0.0012073 -1.3774e-05  9.4068e-09  -7.881e-16
    0.40576     0.33862     0.24631    -0.55972    -0.56221    -0.17412   -0.018553  0.00048115 -9.9793e-07  7.8486e-13
   -0.29771    -0.39368     0.15776     0.33283    -0.53625    -0.55959    -0.14118   0.0087992 -5.6954e-05  4.2537e-10
     0.2115     0.34342    -0.31931     0.08263     0.33565    -0.58517    -0.51574    0.086429  -0.0018297   1.322e-07
   -0.14645    -0.26395     0.32013     -0.2757     0.08377     0.28757    -0.67901     0.42259   -0.032152  2.3227e-05
   0.099134     0.18855    -0.25695     0.28897     -0.2605     0.12855     0.18797     0.78049    -0.27607   0.0021351
  -0.065004    -0.12693     0.18232    -0.22672     0.25333    -0.24928     0.18407    -0.03749    -0.85213    0.085372
  -0.037546   -0.074114     0.10871     -0.1403     0.16766    -0.18912      0.2015     0.19558     0.12479    -0.89627
  -0.065047    -0.12864     0.18936    -0.24585     0.29685    -0.34122     0.37796     0.40627     0.42549     0.43522

上記にある、サブルーチンPDORMTRのエラーの解決方法はわからなかった。 だが、実行は途中で止まらないので、とりあえず、問題ない。