Closed ponweist closed 10 years ago
Note that the critical code section has been duplicated to get_morb_R
(_getoper.F90, lines 1006ff.)
This is the current trace (16sm case, 32 processes, all berry tasks enabled, kpath and kslice disabled):
Performance analysis for 16sm case running on 32 processes with the following parameters:
kpath = F
kslice = F
berry = T
berry_task = ahc,morb,kubo
berry_kmesh = 32 32 32
New trace:
Performance (in CPU cycles) improvement relative to previous code version:
Routine | Previous | Current | Speedup factor |
---|---|---|---|
berry_main |
1.1e13 | 7.8e12 | ~ 1.4 |
get_morb_R |
3.8e12 | 5.6e11 | ~ 6.8 |
The next bottleneck in get_morb_R
appeared in lines 854ff:
! Wannier-gauge overlap matrix S in the projected subspace
!
call get_win_min(ik,winmin_q)
call get_win_min(nnlist(ik,nn),winmin_qb)
S=cmplx_0
H_q_qb(:,:)=cmplx_0
do m=1,num_wann
do n=1,num_wann
do i=1,num_states(ik)
ii=winmin_q+i-1
do j=1,num_states(nnlist(ik,nn))
jj=winmin_qb+j-1
x = conjg(v_matrix(i,n,ik))*S_o(ii,jj)&
*v_matrix(j,m,nnlist(ik,nn))
S(n,m)=S(n,m) + x
H_q_qb(n,m)=H_q_qb(n,m) + x*eigval(ii,ik)
enddo
enddo
enddo
enddo
Check if an extended version of get_gauge_overlap_matrix
with an optional output parameter for H_q_qb
can be used here.
Now using extended routine get_gauge_overlap_matrix
with optional output parameter for H_q_qb
.
New trace:
New performance analysis:
Routine | Previous | Current | Speedup factor |
---|---|---|---|
berry_main |
1.1e13 | 7.4e12 | ~ 1.5 |
get_morb_R |
3.8e12 | 1.8e11 | ~ 21 |
Time for initialization is down from ~53s to ~3s(!).
In
get_CC_R
(_getoper.F90, lines 781ff.), the following matrix product is done inefficiently:A similar improvement as for
get_AA_R
(see #2) needs to be done.