ponweist / Wannier90-PRACE

Optimizations for Wannier90 (fork repository - see http://wannier.org for the official version).
GNU General Public License v2.0
1 stars 0 forks source link

Cleanup get_oper.F90 #7

Closed ponweist closed 10 years ago

ponweist commented 10 years ago

When re-running the testcase from #2 in order to compare the performance after parallelization of kpath (see #5), initialization phase is very inefficient again:

trace-iss7

Testcase parameters

32sm running at 64 processes:

kpath = T
kpath_task = curv
kpath_num_points = 500
kpath_bands_colour = spin

kslice = F

berry = T
berry_task = ahc
berry_kmesh = 48 48 48

Analysis

The reason for the inefficiency is that for the specific testcase parameters instead of the optimized routine get_morb_R (see #3), get_ahc_R is called, containing again the inefficient loop (_getoper.F90, lines 402ff.):

          ! Wannier-gauge overlap matrix S in the projected subspace
          !
          call get_win_min(ik,winmin_q)
          call get_win_min(nnlist(ik,nn),winmin_qb)
          S=cmplx_0
          do m=1,num_wann
             do n=1,num_wann
                do i=1,num_states(ik)
                   ii=winmin_q+i-1
                   do j=1,num_states(nnlist(ik,nn))
                      jj=winmin_qb+j-1
                      S(n,m)=S(n,m)&
                           +conjg(v_matrix(i,n,ik))*S_o(ii,jj)&
                           *v_matrix(j,m,nnlist(ik,nn))
                   end do
                end do
             end do
          end do

TODO

ponweist commented 10 years ago

Trace after e6d520de14b856e78af5a4c43c176079b5479c75: trace-iss7-fix

Time for initialization is down from 143s to 8s.