When re-running the testcase from #2 in order to compare the performance after parallelization of kpath (see #5), initialization phase is very inefficient again:
Testcase parameters
32sm running at 64 processes:
kpath = T
kpath_task = curv
kpath_num_points = 500
kpath_bands_colour = spin
kslice = F
berry = T
berry_task = ahc
berry_kmesh = 48 48 48
Analysis
The reason for the inefficiency is that for the specific testcase parameters instead of the optimized routine get_morb_R (see #3), get_ahc_R is called, containing again the inefficient loop (_getoper.F90, lines 402ff.):
! Wannier-gauge overlap matrix S in the projected subspace
!
call get_win_min(ik,winmin_q)
call get_win_min(nnlist(ik,nn),winmin_qb)
S=cmplx_0
do m=1,num_wann
do n=1,num_wann
do i=1,num_states(ik)
ii=winmin_q+i-1
do j=1,num_states(nnlist(ik,nn))
jj=winmin_qb+j-1
S(n,m)=S(n,m)&
+conjg(v_matrix(i,n,ik))*S_o(ii,jj)&
*v_matrix(j,m,nnlist(ik,nn))
end do
end do
end do
end do
TODO
[x] Cleanup _getoper.F90 and minimize duplicated code.
Possible approach: Join different get_* routines to a single routine, providing logical flags as parameters for indicating which matrices need to be initialized.
[x] Consistently use get_gauge_overlap_matrix instead of nested loops similar to the above code snippet.
Optoinal: Think about better names for get_gauge_overlap_matrix and its parameters.
When re-running the testcase from #2 in order to compare the performance after parallelization of kpath (see #5), initialization phase is very inefficient again:
Testcase parameters
32sm running at 64 processes:
Analysis
The reason for the inefficiency is that for the specific testcase parameters instead of the optimized routine
get_morb_R
(see #3),get_ahc_R
is called, containing again the inefficient loop (_getoper.F90, lines 402ff.):TODO
get_*
routines to a single routine, providing logical flags as parameters for indicating which matrices need to be initialized.get_gauge_overlap_matrix
instead of nested loops similar to the above code snippet.get_gauge_overlap_matrix
and its parameters.