ponweist / Wannier90-PRACE

Optimizations for Wannier90 (fork repository - see http://wannier.org for the official version).
GNU General Public License v2.0
1 stars 0 forks source link

Parallelize k_slice #8

Closed ponweist closed 9 years ago

ponweist commented 10 years ago

The following loop (kslice.F90) needs to be parallelized:

       ! Loop over uniform mesh of k-points on the slice
       !
       do loop_xy=0,product(kslice_2dkmesh)-1
          loop_x=loop_xy/kslice_2dkmesh(2)
          loop_y=loop_xy-loop_x*kslice_2dkmesh(2)          
          ! k1 and k2 are the coefficients of the k-point in the basis
          ! (kslice_b1,kslice_b2)
          k1=loop_x*db1
          k2=loop_y*db2             
          kpt=kslice_corner+k1*kslice_b1+k2*kslice_b2
          ! Convert to (kpt_x,kpt_y), the 2D Cartesian coordinates
          ! with x along x_vec=b1 and y along y_vec
          kpt_x=k1*b1mod+k2*b2mod*cosb1b2
          kpt_y=k2*b2mod*cosyb2
          if(.not.fermi_lines_color) write(coorddataunit,'(2E16.8)') kpt_x,kpt_y

          if(plot_fermi_lines) then
             if(fermi_lines_color) then
                call get_spin_nk(kpt,spn_k)
                do n=1,num_wann
                   if(spn_k(n)>1.0_dp-eps8) then
                      spn_k(n)=1.0_dp-eps8
                   elseif(spn_k(n)<-1.0_dp+eps8) then
                      spn_k(n)=-1.0_dp+eps8
                   endif
                enddo
                call get_eig_deleig(kpt,eig,del_eig,HH,delHH,UU)
                Delta_k=max(b1mod*db1,b2mod*db2)
             else
                call fourier_R_to_k(kpt,HH_R,HH,0)
                call utility_diagonalize(HH,num_wann,eig,UU)
             endif
             do n=1,num_wann
                if(.not.fermi_lines_color) then
                   ! For python
                   write(bandsunit,'(E16.8)') eig(n)
                   ! For gnuplot, using 'grid data' format
                    if(.not.heatmap) then
                       write(bnddataunit(n),'(3E16.8)') kpt_x,kpt_y,eig(n)
                       if(loop_y==kslice_2dkmesh(2)-1 .and. &
                            loop_x/=kslice_2dkmesh(1)-1) write (bnddataunit(n),*) ' '
                    endif
                elseif(kslice_fermi_lines_colour=='spin') then
                   ! vdum = dE/dk projected on the k-slice
                   zhat=zvec/sqrt(dot_product(zvec,zvec))
                   vdum(:)=del_eig(n,:)-dot_product(del_eig(n,:),zhat)*zhat(:)
                   Delta_E=sqrt(dot_product(vdum,vdum))*Delta_k
!                   Delta_E=Delta_E*sqrt(2.0_dp) ! optimize this factor
                   if(abs(eig(n)-kslice_fermi_level)<Delta_E)&
                        write(dataunit,'(3E16.8)') kpt_x,kpt_y,spn_k(n)
                endif
             enddo
          endif

          if(plot_curv) then
             call get_imfgh_k_list(kpt,imf_k_list)
             curv(1)=sum(imf_k_list(:,1,1))
             curv(2)=sum(imf_k_list(:,2,1))
             curv(3)=sum(imf_k_list(:,3,1))
             if(berry_curv_unit=='bohr2') curv=curv/bohr**2   
             ! Print the negative Berry curvature 
             write(zdataunit,'(3E16.8)') -curv(:)
          end if

          if(plot_morb) then
             call get_imfgh_k_list(kpt,imf_k_list,img_k_list,imh_k_list)
             Morb_k=img_k_list(:,:,1)+imh_k_list(:,:,1)&
                   -2.0_dp*fermi_energy_list(1)*imf_k_list(:,:,1)
             Morb_k=-Morb_k/2.0_dp ! differs by -1/2 from Eq.97 LVTS12
             morb(1)=sum(Morb_k(:,1))
             morb(2)=sum(Morb_k(:,2))
             morb(3)=sum(Morb_k(:,3))
             write(zdataunit,'(3E16.8)') morb(:)
          end if

       end do !loop_xy
ponweist commented 10 years ago

Testcases and runtimes after f0ba9074f381cb489fccd0d1d2e3660a01c9ae15:

Testcase A

Parameters:

kpath = F

kslice = T
kslice_task=fermi_lines,curv
kslice_2dkmesh = 100 100
!below is 0.0  0.0  1/8 half of L point
kslice_corner = 0.25  0.0  0.25
kslice_b1 =     1.0  1.0  0.0
kslice_b2 =     0.0  1.0  1.0

berry = F

Runtime: 133s

Testcase B

Parameters:

kpath = F

kslice = T
kslice_task=fermi_lines,morb
kslice_2dkmesh = 100 100
!below is 0.0  0.0  1/8 half of L point
kslice_corner = 0.25  0.0  0.25
kslice_b1 =     1.0  1.0  0.0
kslice_b2 =     0.0  1.0  1.0

berry = F

Runtime: 207s

Testcase C

Parameters:

kpath = F

kslice = T
kslice_task=fermi_lines
kslice_fermi_lines_colour=spin
kslice_2dkmesh = 100 100
!below is 0.0  0.0  1/8 half of L point
kslice_corner = 0.25  0.0  0.25
kslice_b1 =     1.0  1.0  0.0
kslice_b2 =     0.0  1.0  1.0

berry = F

Runtime: 102s

Testcase D

Parameters:

kpath = F

kslice = T
kslice_task=fermi_lines
kslice_fermi_lines_colour=none
kslice_2dkmesh = 100 100
!below is 0.0  0.0  1/8 half of L point
kslice_corner = 0.25  0.0  0.25
kslice_b1 =     1.0  1.0  0.0
kslice_b2 =     0.0  1.0  1.0

berry = F

Runtime: 57s

ponweist commented 10 years ago

Runtime improvement from 102s to 67s for Testcase C, after optimization of utility_rotate_diag bbdd774710b20cd658c3d31bd97cbf0f2a975aa1.

ponweist commented 10 years ago

Output files for all 4 testcases are now identical again comparing with those from the reference runs f0ba9074f381cb489fccd0d1d2e3660a01c9ae15.

ponweist commented 10 years ago

Paralleization now done in new branch "iss8"; changes should be merged back to master branch after fixing #13.

Output files for all 4 testcases are identical compared to those of f0ba9074f381cb489fccd0d1d2e3660a01c9ae15.

Timings:

Testcase Original New
A 133s 19.5s
B 207s 21.5s
C 102s 15.2s
D 57s 6.4s
ponweist commented 10 years ago

Trace for testcase C: trace-iss8c-fix