jsboer commented 6 months ago

Hi, I have solve the matrix use 1 step\ 2 step and scalapack way, but get three different resulet. This is a 240*240 symmetry matrix, and compute all eigenvalue. All the test has passed. what could cause this ?

some smallest eigenvalue

scalapack

0.173720131800915       0.173878822327725       0.187152350855710     
  0.191423385819358       0.194181552429425       0.194181552844384     
  0.194449431187434       0.194449431797715       0.199507996641544

elpa 2step

-2.07585731052087       -2.03591585272923       -1.72390402574035     
  -1.66384803970337       -1.52011422574109      -1.43106927116121     
  -1.39056098837714       -1.29185171666780     -1.24300500320917

elpa 1 step

-3.14666685601262       -3.14303290459765      -3.13546185348343       
-3.00895984604710       -2.99530541682141      -2.98274897202263       
-2.90288228355549       -2.90085399507448      -2.87626101080164

procedure

blacs initial

      call blacs_pinfo(my_proc, nprocessors)
      call blacs_get(0, 0, icontext)
      call blacs_gridinit(icontext, 'C', nprow, npcol)
      call blacs_gridinfo(Icontext, nprow, npcol, myrow, mycol)
      ir = max(1, numroc(s%norbitals, nb, myrow, 0, nprow))
      ic = max(1, numroc(s%norbitals, nb, mycol, 0, npcol))
      call descinit(desc_x, s%norbitals, s%norbitals, nb, nb, 0, 0,        &
      &                 icontext, ir, info)

elpa set

  ICTXT = desca(CTXT_para)
  MB =desca(MB_para)
  NB = desca(NB_para)
  if(MB .ne. NB) stop " not support block size not equal of row and col"
  call blacs_gridinfo(ICTXT, NPROW, NPCOL, my_prow, my_pcol)
  NLROW = numroc(n, MB, my_prow, 0, NPROW) ! number of rows contained in mine
  NLCOL = numroc(n, MB, my_pcol, 0, NPCOL)
  bandwidth = MB
  if (elpa_init(CURRENT_API_VERSION) /= ELPA_OK) then
     print *, "ELPA API version not supported"
     stop 1
  endif
  e => elpa_allocate(error_elpa)
  call e%set("na", int(n,kind=c_int), error_elpa)
  call e%set("nev", int(nev,kind=c_int), error_elpa)
  call e%set("local_nrows", int(NLROW,kind=c_int), error_elpa)
  call e%set("local_ncols", int(NLCOL,kind=c_int), error_elpa)
  call e%set("nblk", int(MB,kind=c_int), error_elpa)
  call e%set("mpi_comm_parent", int(MPI_COMM_WORLD,kind=c_int), error_elpa)
  call e%set("process_row", int(my_prow,kind=c_int), error_elpa)
  call e%set("process_col", int(my_pcol,kind=c_int), error_elpa)
  call e%set("bandwidth", int(bandwidth,kind=c_int), error_elpa)
  call e%eigenvectors(a, ev, z, error_elpa) !use environment to decide which solver is used

marekandreas commented 6 months ago

Dear @jsboer

this should not happen at all. Up to numerical precision results must be the same, or ELPA should indicate an error. I understand you are using a fixed matrix for all three tests? Can you provide this to us via the ELPA email? Please also provide:

your configure line used to build ELPA
the number of MPI processes used for this tests
the block size for the block-cyclic distribution of BLACS
which version of MPI, LAPACK, BLACS and ScaLAPACK you used
what datatype your matric is: real/complex double/float

Can you please also run the two ELPA test programs for the correct datatype and solving eigenvectors for 1stage and 2stage with arguments "240 the_number_of_eigenvectors_you_need blacs_blocksize" with the exact number of MPI processes you used in your tests? So sth. like: mpirun -n 2 ./validate_real_double_eigenvectors_1stage_random 240 240 16

And report whether this tests pass...

jsboer commented 6 months ago

Hi, these three test are use the exactly same matrix, I use the same matrix first solved by elpa and then solved by scalapack. When compile the ELPA, all the ELPA check are passed. So Maybe ELPA do not have error but maybe how I use ELPA is wrong? I use mpirun -n 2 ./validate_real_double_eigenvectors_1stage_random 240 240 64 and mpirun -n 2 ./validate_real_double_eigenvectors_2stage_all_kernels_random 240 240 64 , the result seems true.

! one stage
 Results of numerical residual checks:
 Error Residual     :  9.396239274961853E-014
 Maximal error in eigenvector lengths:  2.442490654175344E-015
 Error Orthogonality:  2.442490654175344E-015
! two stage
 Results of numerical residual checks:
 Error Residual     :  2.831158399978648E-013
 Maximal error in eigenvector lengths:  2.664535259100376E-015
 Error Orthogonality:  2.664535259100376E-015

And following is the information

your configure line used to build ELPA

 FC=mpiifort  CC=mpiicc CXX=mpiicpc   ../configure \
 FCFLAGS="-O3 -march=core-avx2 " \
 CFLAGS=" -O3 -march=core-avx2 "  CXXFLAGS="-O3 -march=core-avx2 " \
 --enable-option-checking=fatal \
 SCALAPACK_LDFLAGS=" -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " \
 SCALAPACK_FCFLAGS="-I$MKL_HOME/include  -I$MKL_HOME/include/intel64/lp64 -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " \
 --enable-avx2 --disable-avx512-kernels

`the number of MPI processes` and `the block size for the block-cyclic distribution of BLACS` and `datatype`

only 2 MPI PROCESSOR
block size is 64,
real, double

which version of MPI, LAPACK, BLACS and ScaLAPACK

All from intel 2021.3 version

jsboer commented 6 months ago

I have send the matrix to the ELPA email elpa-library@mpcdf.mpg.de.

jsboer commented 6 months ago

The way I compare the results between ELPA and Scalapack is as follows. So the matrix layout should be right？ Or are there some different requirements for ELPA and Scalapack? By the way, the results of ScaLAPACK and Lapack(dsyevd) are consistent.

      allocate(elpa_eigenvalue_slave(1:s%norbitals)); allocate(elpa_eigenvec_slave(1:ir, 1:ic))
      call elpa_diag(desc_x, yyyy, s%norbitals, s%norbitals, elpa_eigenvalue_slave, elpa_eigenvec_slave, info)

      call pdsyevd('V', 'U', s%norbitals, yyyy, 1, 1, desc_x, eigen,       &
      &                xxxx, 1, 1, desc_x, rwork, -1, iwork, -1, info)
      lrwork = rwork(1)
      liwork = iwork(1)
      deallocate (rwork, iwork)
      allocate (rwork(lrwork)); allocate (iwork(liwork))
      call pdsyevd('V', 'U', s%norbitals, yyyy, 1, 1, desc_y, eigen,       &
      &                xxxx, 1, 1, desc_x, rwork, lrwork, iwork, liwork, info)

marekandreas commented 6 months ago

Dear @jsboer ,

I have also noticed that you set call e%set("bandwidth", int(bandwidth,kind=c_int), error_elpa) As I understand it, your matrix is a full, dense matrix so setting the "bandwidth" option will lead to wrong results in ELPA2. The banwidth-option is for the case that a matrix has already a certain bandwidth, i.e. only some diagonals below/above the main-diagonal contain non-zero values. Away from this band , all entries must be zero. If this is not the case this is not a banded matrix and setting bandwidth to a value is wrong

jsboer commented 6 months ago

Oh! thank you! Now I get the right result.

marekandreas / elpa

inconsistent result among 1step\2step and scalapack #54

some smallest eigenvalue

scalapack

elpa 2step

elpa 1 step

procedure

blacs initial

elpa set

your configure line used to build ELPA

`the number of MPI processes` and `the block size for the block-cyclic distribution of BLACS` and `datatype`

which version of MPI, LAPACK, BLACS and ScaLAPACK

marekandreas / elpa

inconsistent result among 1step\2step and scalapack #54

some smallest eigenvalue

scalapack

elpa 2step

elpa 1 step

procedure

blacs initial

elpa set

your configure line used to build ELPA

the number of MPI processes and the block size for the block-cyclic distribution of BLACS and datatype

which version of MPI, LAPACK, BLACS and ScaLAPACK

`the number of MPI processes` and `the block size for the block-cyclic distribution of BLACS` and `datatype`