ELPA-2019.05.001のinternal wrapperを作る。

t-sakashita commented 5 years ago

mpirun --hostfile ~/my-hostfile -np 4 ./frank_mpi elpa

[~/build/rokko/example/cxx/dense]
Eigenvalue decomposition of Frank matrix
library:routine = elpa
num_procs = 4
num_threads per process = 4
routine = 
dimension = 10
 elpa_allocate(): you must call elpa_init() once before creating instances of ELPA
 elpa_allocate(): you must call elpa_init() once before creating instances of ELPA
 elpa_allocate(): you must call elpa_init() once before creating instances of ELPA
 elpa_allocate(): you must call elpa_init() once before creating instances of ELPA

elpa_initを呼ばないとエラーとなったので、Rokkoのラッパーから呼び出す。

t-sakashita commented 5 years ago

以下に相当する文を調べる。

elpa_get_communicators(comm_f, mat.get_grid().get_myrow(), mat.get_grid().get_mycol(), &mpi_comm_rows, &mpi_comm_cols);

elpa_get_communicatorsは、legacyな取り扱いとなっているため、使用が推奨されていない。任意のMPIコミュニケータに対応できるようにする。

t-sakashita commented 5 years ago

elpa/elpa.hが２つある。

include_directories(${PROJECT_SOURCE_DIR}/rokko)
include_directories(${PROJECT_BINARY_DIR}/rokko)

include_directories(${ELPA_INCLUDE_DIR})

rokko/benchmark/no_rokko/dense_minij_mpiにおいて、ELPAのelpa.hを使いたいのに、Rokkoのelpa.hが使われて、コンパイルエラーとなった。

t-sakashita commented 5 years ago

以下を参考にして、インクルードのディレクトリ指定の対象のソースファイルを限定する。 https://qiita.com/shohirose/items/5b406f060cd5557814e9

t-sakashita commented 5 years ago

https://qiita.com/nanigashi_uji/items/64f003e9b66b97251053

t-sakashita commented 5 years ago

任意のMPIコミュニケータを扱う。

参考：elpa-2019.05.001/test/Fortran/test_split_comm.F90

     call elpa%set("mpi_comm_parent", communicator, error)
     assert_elpa_ok(error)
     call elpa%set("process_row", my_prow, error)
     assert_elpa_ok(error)
     call elpa%set("process_col", my_pcol, error)
     assert_elpa_ok(error)

t-sakashita commented 5 years ago

プロセスグリッドのmajorを変える方法

t-sakashita commented 5 years ago

test/Fortran/test.F90では、プリプロセッサ定数TEST_ALL_LAYOUTSが定義され用いられている。これは、検算で用いているBLACSのgrid majorを確かめるためのようだ。

ELPA自体のgrid majorはどのように設定している？ BLACSのctxtを渡している様子もない。

以前は、mpi_comm_rows, mpi_comm_colsをELPAのソルバに渡していた。

t-sakashita commented 5 years ago

elpa-2019.05.001/src/elpa1/legacy_interface/elpa1.F90

function elpa_get_communicators(mpi_comm_global, my_prow, my_pcol, mpi_comm_rows, mpi_comm_cols) result(mpierr)
   ! use precision
   use elpa_mpi
   use iso_c_binding
   implicit none

   integer(kind=c_int), intent(in)  :: mpi_comm_global, my_prow, my_pcol
   integer(kind=c_int), intent(out) :: mpi_comm_rows, mpi_comm_cols

   integer(kind=c_int)              :: mpierr

   ! mpi_comm_rows is used for communicating WITHIN rows, i.e. all processes
   ! having the same column coordinate share one mpi_comm_rows.
   ! So the "color" for splitting is my_pcol and the "key" is my row coordinate.
   ! Analogous for mpi_comm_cols

   call mpi_comm_split(mpi_comm_global,my_pcol,my_prow,mpi_comm_rows,mpierr)
   call mpi_comm_split(mpi_comm_global,my_prow,my_pcol,mpi_comm_cols,mpierr)

end function elpa_get_communicators

elpa-2019.05.001/test/Fortran/test.F90

   call mpi_comm_split(MPI_COMM_WORLD,my_pcol,my_prow,mpi_comm_rows,mpierr)
   if (mpierr .ne. MPI_SUCCESS) then
     call MPI_ERROR_STRING(mpierr,mpierr_string, mpi_string_length, mpierr2)
     write(error_unit,*) "MPI ERROR occured during mpi_comm_split for row communicator: ", trim(mpierr_string)
     stop 1
   endif

   call mpi_comm_split(MPI_COMM_WORLD,my_prow,my_pcol,mpi_comm_cols, mpierr)
   if (mpierr .ne. MPI_SUCCESS) then
     call MPI_ERROR_STRING(mpierr,mpierr_string, mpi_string_length, mpierr2)
     write(error_unit,*) "MPI ERROR occured during mpi_comm_split for col communicator: ", trim(mpierr_string)
     stop 1
   endif

   call e%set("mpi_comm_parent", MPI_COMM_WORLD, error)
   assert_elpa_ok(error)
   call e%set("mpi_comm_rows", mpi_comm_rows, error)
   assert_elpa_ok(error)
   call e%set("mpi_comm_cols", mpi_comm_cols, error)
   assert_elpa_ok(error)

t-sakashita commented 5 years ago

elpa-2019.05.001/test/Fortran/test.F90

   call e%set("mpi_comm_parent", MPI_COMM_WORLD, error)
   assert_elpa_ok(error)
   call e%set("mpi_comm_rows", mpi_comm_rows, error)
   assert_elpa_ok(error)
   call e%set("mpi_comm_cols", mpi_comm_cols, error)
   assert_elpa_ok(error)

t-sakashita commented 5 years ago

以下のエラーが出力された。

 Provide mpi_comm_parent and EITHER process_row and process_col OR mpi_comm_rows and mpi_comm_cols. Aborting...

この原因は、以下を同時に指定していることであった。

  elpa_set(handle, "process_row", my_prow, &error);
  elpa_set(handle, "process_col", my_pcol, &error);

  elpa_set(handle, "mpi_comm_rows", mpi_comm_rows_f, &error);
  elpa_set(handle, "mpi_comm_cols", mpi_comm_cols_f, &error);

t-sakashita commented 5 years ago

rokko/rokko/elpa/diagonalize_elpa1.hpp

  MPI_Comm mpi_comm_rows, mpi_comm_cols;
  MPI_Comm_split(MPI_COMM_WORLD, my_prow, my_pcol, &mpi_comm_rows);  // color = my_prow, key = my_pcol
  MPI_Comm_split(MPI_COMM_WORLD, my_pcol, my_prow, &mpi_comm_cols);  // color = my_pcol, key = my_prow
  MPI_Fint mpi_comm_rows_f = MPI_Comm_c2f(mpi_comm_rows);
  elpa_set(handle, "mpi_comm_rows", mpi_comm_rows_f, &error);
  assert_elpa_ok(error);
  MPI_Fint mpi_comm_cols_f = MPI_Comm_c2f(mpi_comm_cols);
  elpa_set(handle, "mpi_comm_cols", mpi_comm_cols_f, &error);
  assert_elpa_ok(error);

上記は間違えていた。

largest eigenvalues: 44.042 8.8043 2.5127 0.75512 0.70024 0.40193 0.38106 0.27705 0.27587 -3.1505
residual of the largest eigenvalue/vector: |x A x - lambda| = 42.105

負の固有値が現れ、残差が大きすぎる。

t-sakashita commented 5 years ago

mpi_comm_rowsとmpi_comm_colsの役割が逆であったので、以下のように修正した。

  MPI_Comm_split(MPI_COMM_WORLD, my_prow, my_pcol, &mpi_comm_cols);  // color = my_prow, key = my_pcol
  MPI_Comm_split(MPI_COMM_WORLD, my_pcol, my_prow, &mpi_comm_rows);  // color = my_pcol, key = my_prow

すると、固有値が正しくなった：

ELPAの例題プログラムをみると、デフォルトのmajorは'C'のようだ。

t-sakashita commented 5 years ago

プロセスグリッドのmajorを指定するには、BLACSのコンテキストをELPAに渡すしかないのか？

   call e%set("blacs_context", my_blacs_ctxt, error)

プリプロセッサ定数TEST_GENERALIZED_EIGENPROBLEMで囲まれている。これから察するに、このコンテキストは、ELPA本体ではなく、ScaLAPACKの他のルーチンで使うためであろう。

t-sakashita commented 5 years ago

いったん変数に格納せずに、関数MPI_Comm_c2fを噛ませて渡せる。 elpa-2019.05.001/test/C/test.c

   elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);

ELPAで用意されているFortranとCのインターフェースでは、引数は値返しになっているはず。

t-sakashita commented 5 years ago

以下のように、完成：

  int my_prow = mat.get_grid().get_myrow();
  int my_pcol = mat.get_grid().get_mycol();
  MPI_Comm mpi_comm_rows, mpi_comm_cols;
  MPI_Comm_split(MPI_COMM_WORLD, my_pcol, my_prow, &mpi_comm_rows);  // color = my_pcol, key = my_prow
  MPI_Comm_split(MPI_COMM_WORLD, my_prow, my_pcol, &mpi_comm_cols);  // color = my_prow, key = my_pcol
  elpa_set(handle, "mpi_comm_rows", MPI_Comm_c2f(mpi_comm_rows), &error);
  assert_elpa_ok(error);
  elpa_set(handle, "mpi_comm_cols", MPI_Comm_c2f(mpi_comm_cols), &error);
  assert_elpa_ok(error);

t-sakashita commented 5 years ago

mpirun --hostfile ~/my-hostfile -np 4 ./minij_mpi elpa:elpa2

Eigenvalue decomposition of minij matrix
library:routine = elpa:elpa2
num_procs = 4
num_threads per process = 4
routine = elpa2
dimension = 10
largest eigenvalues: 44.766 5.0489 1.873 1 0.6431 0.46523 0.36621 0.30798 0.27379 0.25568
eigenvectors:
    0.12864     0.24585    -0.34122    -0.40627    -0.43522    -0.42549     0.37796    -0.29685     0.18936   -0.065047
   -0.24585    -0.40627     0.42549     0.29685    0.065047    -0.18936     0.37796    -0.43522     0.34122    -0.12864
    0.34122     0.42549    -0.18936     0.18936     0.42549     0.34122  2.3315e-15    -0.34122     0.42549    -0.18936
   -0.40627    -0.29685    -0.18936    -0.43522    -0.12864     0.34122    -0.37796   -0.065047     0.42549    -0.24585
    0.43522    0.065047     0.42549     0.12864    -0.40627    -0.18936    -0.37796     0.24585     0.34122    -0.29685
   -0.42549     0.18936    -0.34122     0.34122     0.18936    -0.42549 -7.2164e-16     0.42549     0.18936    -0.34122
    0.37796    -0.37796 -9.5479e-15    -0.37796     0.37796 -2.3731e-15     0.37796     0.37796 -2.7756e-16    -0.37796
   -0.29685     0.43522     0.34122   -0.065047    -0.24585     0.42549     0.37796     0.12864    -0.18936    -0.40627
    0.18936    -0.34122    -0.42549     0.42549    -0.34122     0.18936  8.3267e-16    -0.18936    -0.34122    -0.42549
  -0.065047     0.12864     0.18936    -0.24585     0.29685    -0.34122    -0.37796    -0.40627    -0.42549    -0.43522
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  sakashitatatsuyanoMacBook-puro.local
  System call: unlink(2) /var/folders/s_/mm6rvl7s4_sc1gz8l1g098s80000gn/T//ompi.sakashitatatsuyanoMacBook-puro.501/pid.42856/1/vader_segment.sakashitatatsuyanoMacBook-puro.c2a50001.2
  Error:       No such file or directory (errno 2)
--------------------------------------------------------------------------
[sakashitatatsuyanoMacBook-puro.local:42856] 1 more process has sent help message help-opal-shmem-mmap.txt / sys call fail
[sakashitatatsuyanoMacBook-puro.local:42856] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

実行終了後に、エラーが出ている。

以下のようにデバッガにかけると、エラーは出なかった。

mpirun --hostfile ~/my-hostfile -np 4 xterm -e lldb -o run ./minij_mpi elpa:elpa2

これより、上記のエラーは、無視しても良いと思われる。（OpenMPIの内部実装に依存したエラーか？）

t-sakashita commented 5 years ago

2STAGE用のパラメータ"kernel"と"qr"も取り込んだ。

t-sakashita commented 5 years ago

ab38e23394192b56458b382bd85f60c1f38e8d26

t-sakashita / rokko

ELPA-2019.05.001のinternal wrapperを作る。 #337