Possible bug in Wannier90 (v3.0.0) - Results depend on number of cpu used for DFT calculation. #291

Closed sponce24 closed 5 years ago

Dear Wannier developers,

There might be a bug related to the number of core the DFT quantities were produced with. In the case of non-cubic (here hexagonal GaN) materials, the wannier90 v3 code produces different results depending on the number of cores pw.x was used for (using wf_collect). Notably I used "-npool" for k-point parallelization. This is a requirement of EPW.

I first encountered this issue while creating a new test for the QE test-farm. The test can be found using the developer GitLab version of QE in test-suite/epw_mob_polar. The error can be reproduced by simply running the test in sequential or parallel. At present, I have disable the check so that it can past the nightly.

Initially, we though that this issue might be related with the w90 library version but it turns out it is also present in wannier.

The bug. We are using the same input files. If we do:

~/program/q-e-sponce24/bin/pw.x < scf.in | tee scf.out
~/program/q-e-sponce24/bin/pw.x < nscf.in | tee nscf.out
~/program/q-e-sponce24/bin/wannier90.x -pp  gan.win
~/program/q-e-sponce24/bin/pw2wannier90.x < pw2wan.in | tee pw2wan.out
~/program/q-e-sponce24/bin/wannier90.x  gan.win

we get Final State

  WF centre and spread    1  ( -3.151998,  1.826535, -0.569805 )     0.91108668
  WF centre and spread    2  (  1.577723,  0.910434, -1.036771 )     1.32709200
  WF centre and spread    3  (  0.002147,  1.820123, -0.617244 )     0.87900792
  WF centre and spread    4  (  1.577140,  0.910566,  0.452981 )     1.38702953
  WF centre and spread    5  ( -0.000022,  1.821499,  3.023786 )     1.38699006
  WF centre and spread    6  (  1.575223,  0.911865,  1.953528 )     0.87903049
  WF centre and spread    7  ( -0.000464,  1.821548,  1.533979 )     1.32710179
  WF centre and spread    8  ( -1.579739,  0.905264,  2.000991 )     0.91106385
  WF centre and spread    9  (  1.573775, -0.915410, -0.569822 )     0.91102685
  WF centre and spread   10  (  3.157858, -1.816396,  2.000958 )     0.91103796
  Sum of centres and spreads (  4.731643,  8.196027,  8.172581 )    10.83046713

Instead if we do:

mpirun -np 2 ~/program/q-e-sponce24/bin/pw.x -npool 2 < scf.in | tee scf.out
mpirun -np 2 ~/program/q-e-sponce24/bin/pw.x -npool 2 < nscf.in | tee nscf.out
~/program/q-e-sponce24/bin/wannier90.x -pp  gan.win
~/program/q-e-sponce24/bin/pw2wannier90.x < pw2wan.in | tee pw2wan.out
~/program/q-e-sponce24/bin/wannier90.x  gan.win

we get:

 Final State
  WF centre and spread    1  ( -3.148672,  1.820818, -0.569788 )     0.91109275
  WF centre and spread    2  (  1.577261,  0.911242, -1.036787 )     1.32709775
  WF centre and spread    3  ( -0.005902,  1.820714, -0.569834 )     0.91099805
  WF centre and spread    4  (  1.576931,  0.911149,  0.452983 )     1.38702478
  WF centre and spread    5  (  0.000228,  1.821001,  3.023781 )     1.38700267
  WF centre and spread    6  (  1.583158,  0.911170,  2.000950 )     0.91101676
  WF centre and spread    7  (  0.000025,  1.820744,  1.533987 )     1.32709183
  WF centre and spread    8  ( -1.583136,  0.911111,  2.000994 )     0.91107889
  WF centre and spread    9  (  1.577292, -0.908173, -0.617246 )     0.87901937
  WF centre and spread   10  (  3.154527, -1.823715,  1.953532 )     0.87902756
  Sum of centres and spreads (  4.731712,  8.196062,  8.172573 )    10.83045040

The difference are relatively small but clearly signal a bug. Also such differences can lead to larger difference in EPW quantities.

If you have any idea what can be responsible for such differences, please let me know. Thanks, Samuel

I would have thought that this is a bug in pw2wannier90, not wannier90 itself.

Comparing the results of the spread in the initial state is more informative than the final state. Can you show these.

Also, if you run these tests, compiling pwscf with a different compiler / maths libraries, do you get the same results, or are they different again.

Very unusual.

pw2wannier90 doesn't care about (and know) how many cores we used in the steps of scf and nscf, especially in the case of turning on wf_collect. Also in both of your two examples, you ran pw2wannier90.x and wannier90.x serially.

Of course, since it is possible to have different phases in wavefunctions between your two cases, the inputs to pw2wannier90 would be different, thereby leading to different amn and mmn. But I am very surprising at the fact that these difference would lead to the meaningful (big) difference in wannierization.

PS) I guess that you obtained the same wannier90.win in your two cases after performing ~/program/q-e-sponce24/bin/wannier90.x -pp gan.win .

Hello Jonathan and Hyungjun,

Thank you both for your replies.

1) I agree, it can be a bug in pw2wannier90.

Indeed the initial state are also different (calculation done with gfortran 7.4 + openmpi 1.10.7) For 1 cpu:

 Initial State
  WF centre and spread    1  (  2.004563,  1.116077, -0.046001 )     2.05136658
  WF centre and spread    2  (  1.596951,  0.641313, -0.728929 )     1.87481225
  WF centre and spread    3  (  0.975503,  1.264678, -0.728444 )     1.76263841
  WF centre and spread    4  (  1.703538,  0.802870, -0.045137 )     2.09152259
  WF centre and spread    5  ( -0.126384,  1.929003,  2.525647 )     2.09150232
  WF centre and spread    6  (  0.601839,  1.467211,  1.842267 )     1.76260995
  WF centre and spread    7  ( -0.019685,  2.090567,  1.841889 )     1.87482760
  WF centre and spread    8  ( -0.427321,  1.615967,  2.524786 )     2.05134769
  WF centre and spread    9  (  1.579534,  0.911407, -0.149793 )     8.25291312
  WF centre and spread   10  ( -0.002016,  1.820318,  2.421145 )     8.25340333
  Sum of centres and spreads (  7.886521, 13.659412,  9.457430 )    32.06694384

For 2 cpu:

 Initial State
  WF centre and spread    1  (  2.004420,  1.116143, -0.045956 )     2.05127880
  WF centre and spread    2  (  1.597001,  0.641224, -0.728991 )     1.87475892
  WF centre and spread    3  (  0.975535,  1.264686, -0.728392 )     1.76271424
  WF centre and spread    4  (  1.703512,  0.802841, -0.045182 )     2.09165108
  WF centre and spread    5  ( -0.126364,  1.929003,  2.525622 )     2.09157657
  WF centre and spread    6  (  0.601818,  1.467246,  1.842318 )     1.76268398
  WF centre and spread    7  ( -0.019714,  2.090586,  1.841850 )     1.87476970
  WF centre and spread    8  ( -0.427246,  1.615888,  2.524811 )     2.05130419
  WF centre and spread    9  (  1.579520,  0.911300, -0.149789 )     8.25300755
  WF centre and spread   10  ( -0.002020,  1.820388,  2.421129 )     8.25350185
  Sum of centres and spreads (  7.886461, 13.659306,  9.457420 )    32.06724689

To answer your second question, I did the same calculation on a different machine using intel 17+impi:

1 cpu:

 Initial State
  WF centre and spread    1  (  2.004563,  1.116077, -0.046001 )     2.05136658
  WF centre and spread    2  (  1.596951,  0.641313, -0.728929 )     1.87481225
  WF centre and spread    3  (  0.975503,  1.264678, -0.728444 )     1.76263841
  WF centre and spread    4  (  1.703538,  0.802870, -0.045137 )     2.09152259
  WF centre and spread    5  ( -0.126384,  1.929003,  2.525647 )     2.09150232
  WF centre and spread    6  (  0.601839,  1.467211,  1.842267 )     1.76260995
  WF centre and spread    7  ( -0.019685,  2.090567,  1.841889 )     1.87482760
  WF centre and spread    8  ( -0.427321,  1.615967,  2.524786 )     2.05134769
  WF centre and spread    9  (  1.579534,  0.911407, -0.149793 )     8.25291312
  WF centre and spread   10  ( -0.002016,  1.820318,  2.421145 )     8.25340333
  Sum of centres and spreads (  7.886521, 13.659412,  9.457430 )    32.06694384

2 cpu:

 Initial State
  WF centre and spread    1  (  2.004420,  1.116143, -0.045956 )     2.05127880
  WF centre and spread    2  (  1.597001,  0.641224, -0.728991 )     1.87475892
  WF centre and spread    3  (  0.975535,  1.264686, -0.728392 )     1.76271424
  WF centre and spread    4  (  1.703512,  0.802841, -0.045182 )     2.09165108
  WF centre and spread    5  ( -0.126364,  1.929003,  2.525622 )     2.09157657
  WF centre and spread    6  (  0.601818,  1.467246,  1.842318 )     1.76268398
  WF centre and spread    7  ( -0.019714,  2.090586,  1.841850 )     1.87476970
  WF centre and spread    8  ( -0.427246,  1.615888,  2.524811 )     2.05130419
  WF centre and spread    9  (  1.579520,  0.911300, -0.149789 )     8.25300755
  WF centre and spread   10  ( -0.002020,  1.820388,  2.421129 )     8.25350185
  Sum of centres and spreads (  7.886461, 13.659306,  9.457420 )    32.06724689

In other words, its not compiler or machine dependent at all (good news) but clearly CPU number dependent. Therefore pointing toward a bug.

2) I indeed obtain the same .win file after doing the -pp

Best, Samuel

Dear Samuel:

I am very sorry, but if you refer to the files in q-e/test-suite/epw_mob_polar,

could you change diago_thr_init (currently, 1.0e-4) with much lower value in nscf.in? In nscf calculations, iterative diagonalization continues up to the value of diago_thr_init differently from scf calculations.

I guess that in your two cases there is rather large phase difference due to the underconverged wave functions.

PS) I still think that this behaviour is not related to pw2wannier90. pw2wannier90 just takes inputs such as wave functions and eigenvalues from nscf calculations and generally, we can have different phases in wave functions obtained from two different situations (different # of cores and different parallelisation strategies, etc). From different inputs to pw2wannier90.x, we can have different amn and mmn files. But I believe that usually these minor differences should not lead to the meaningful difference in wannierization step.

Now I guess that the difference just comes from the underconverged wave functions rather than phase difference I mentioned above. Since your wave functions would be underconverged, we don't need to mention phase difference. And I guess that phase difference itself might not lead to the meaningful difference in wannierization.

Hello,

The default QE value for diago_thr_init is 1.0e-2 https://www.quantum-espresso.org/Doc/INPUT_PW.html#diago_thr_init so I though that 2 order of magnitude lower should be fine.

In addition for nscf, this is automatically lowered by (N elec)/10.

In any case, I re-did the calculation with much lower parameters (btw, such small parameter cannot work for more complex materials, it will never reach convergence):

 &electrons
    diagonalization='david'
    mixing_beta=0.7
    conv_thr=1.0d-14
    diago_thr_init = 1.0e-10
    diago_full_acc = .true.

both in scf.in and nscf.in and I get:

1 cpu:

Final State
  WF centre and spread    1  ( -3.148651,  1.820770, -0.569796 )     0.91108773
  WF centre and spread    2  (  1.577281,  0.911171, -1.036768 )     1.32708711
  WF centre and spread    3  ( -0.005887,  1.820744, -0.569834 )     0.91103352
  WF centre and spread    4  (  1.577046,  0.910973,  0.453002 )     1.38699911
  WF centre and spread    5  (  0.000219,  1.820931,  3.023787 )     1.38699679
  WF centre and spread    6  (  1.583151,  0.911161,  2.000951 )     0.91103581
  WF centre and spread    7  ( -0.000016,  1.820733,  1.534017 )     1.32708482
  WF centre and spread    8  ( -1.583143,  0.911134,  2.000989 )     0.91108542
  WF centre and spread    9  (  1.577259, -0.908191, -0.617223 )     0.87902619
  WF centre and spread   10  (  3.154535, -1.823714,  1.953562 )     0.87902852
  Sum of centres and spreads (  4.731795,  8.195713,  8.172686 )    10.83046501

2 cpu:

 Final State
  WF centre and spread    1  ( -3.148666,  1.820824, -0.569797 )     0.91109236
  WF centre and spread    2  (  1.577290,  0.911176, -1.036769 )     1.32708652
  WF centre and spread    3  ( -0.005893,  1.820688, -0.569833 )     0.91102961
  WF centre and spread    4  (  1.577050,  0.910974,  0.453002 )     1.38699814
  WF centre and spread    5  (  0.000214,  1.820930,  3.023787 )     1.38699579
  WF centre and spread    6  (  1.583157,  0.911221,  2.000952 )     0.91103149
  WF centre and spread    7  ( -0.000025,  1.820728,  1.534016 )     1.32708422
  WF centre and spread    8  ( -1.583127,  0.911078,  2.000988 )     0.91109012
  WF centre and spread    9  (  1.577304, -0.908185, -0.617223 )     0.87902740
  WF centre and spread   10  (  3.154488, -1.823722,  1.953563 )     0.87903008
  Sum of centres and spreads (  4.731792,  8.195713,  8.172686 )    10.83046573

So it seems better but still not the same.

Moreover, in the case of 1 core, it converges in 300 cycle while in the case of 2 core it converges in 200 cycles (I'm printing every 100 cycle so it might not be that different but for sure not the same number).

Its a bit scary how different the wannier center are located with relatively well converged WF.

Edit: I increased ecut from 40 Ry to 60 Ry and

    diagonalization='david'
    mixing_beta=0.7
    conv_thr=1.0d-14
    diago_thr_init = 1.0e-14
    diago_full_acc = .true.

With that I get exactly the same initial state for both 1 cpu and 2 cpu (good):

 Initial State
  WF centre and spread    1  (  2.008699,  1.117420, -0.045903 )     2.09460756
  WF centre and spread    2  (  1.587402,  0.636731, -0.753712 )     1.94454794
  WF centre and spread    3  (  0.953027,  1.274372, -0.752727 )     1.81871493
  WF centre and spread    4  (  1.712082,  0.804409, -0.044977 )     2.13544759
  WF centre and spread    5  ( -0.134817,  1.927495,  2.525808 )     2.13544480
  WF centre and spread    6  (  0.624238,  1.457532,  1.818059 )     1.81871446
  WF centre and spread    7  ( -0.010137,  2.095173,  1.817073 )     1.94454491
  WF centre and spread    8  ( -0.431434,  1.614484,  2.524883 )     2.09460514
  WF centre and spread    9  (  1.579325,  0.911197, -0.204959 )     8.59629099
  WF centre and spread   10  ( -0.002060,  1.820707,  2.365826 )     8.59628866
  Sum of centres and spreads (  7.886324, 13.659521,  9.249372 )    33.17920698

but the final state is still significantly different: 1 core

 Final State
  WF centre and spread    1  ( -3.156012,  1.822651, -0.635723 )     0.86453635
  WF centre and spread    2  (  1.577052,  0.910865, -1.104090 )     1.34507774
  WF centre and spread    3  ( -0.004172,  1.822953, -0.575365 )     0.89135920
  WF centre and spread    4  (  1.581912, -0.912764, -0.575128 )     0.89120731
  WF centre and spread    5  (  3.156076, -1.822574,  1.935062 )     0.86453553
  WF centre and spread    6  (  1.581380,  0.908835,  1.995419 )     0.89136493
  WF centre and spread    7  (  0.000220,  1.821048,  1.466696 )     1.34507528
  WF centre and spread    8  (  0.000813,  1.820761,  3.008186 )     1.42944321
  WF centre and spread    9  (  1.576458,  0.911152,  0.437401 )     1.42944525
  WF centre and spread   10  ( -1.581885,  0.912843,  1.995659 )     0.89120546
  Sum of centres and spreads (  4.731843,  8.195768,  7.948116 )    10.84325027

2 cores:

 Final State
  WF centre and spread    1  ( -3.156492,  1.821751, -0.635724 )     0.86453576
  WF centre and spread    2  (  1.576953,  0.910679, -1.104090 )     1.34507751
  WF centre and spread    3  ( -0.003451,  1.824177, -0.575363 )     0.89135445
  WF centre and spread    4  (  1.576411,  0.911063,  0.437401 )     1.42944538
  WF centre and spread    5  (  0.000839,  1.820799,  3.008186 )     1.42944337
  WF centre and spread    6  (  1.581027,  0.908197,  1.995417 )     0.89136966
  WF centre and spread    7  (  0.000283,  1.821142,  1.466696 )     1.34507543
  WF centre and spread    8  ( -1.581616,  0.913348,  1.995660 )     0.89119937
  WF centre and spread    9  ( -3.150453,  1.818202, -0.575130 )     0.89121621
  WF centre and spread   10  (  3.156361, -1.822137,  1.935062 )     0.86453568
  Sum of centres and spreads ( -0.000137, 10.927221,  7.948115 )    10.84325281

Could you confirm that I can obtain your inputs in q-e/test-suite/epw_mob_polar in gitlab?

Also in all your tests, did you perform serial runs of both pw2wannier90.x and wannier90.x? That is, you just changed the number of cores (1 and 2 cores) only in scf and nscf calculations?

Hello,

No I used a pure Wannier90 version with slightly different parameters. I sent you all the input file at hyungjun.lee@epfl.ch Is that your correct email address ?

Best wishes, Samuel

I left EPFL last year, but for the moment I can access to this email account. I will check your email now. But, perhaps I can get back to you tomorrow; Now it is late evening here.

Hi, are we sure the results are really different?

Here is a diff I did, after reordering a bit the WFs.

diff

All WF centres and spreads seem to be the same at least for ~3 significant digits (ok, it's not perfect, but also the Wannierisation has a threshold and probably one could push a bit more it in your simulations).

The only exception are the centres for the line WF centre and spread 4 ( 1.581912, -0.912764, -0.575128 ) vs WF centre and spread 9 ( -3.150453, 1.818202, -0.575130 )

However, from the reference output file on the q-e test suite we have:

 celldm(1)=   5.961200  celldm(2)=   0.000000  celldm(3)=   1.629900
     celldm(4)=   0.000000  celldm(5)=   0.000000  celldm(6)=   0.000000

     crystal axes: (cart. coord. in units of alat)
               a(1) = (   1.000000   0.000000   0.000000 )  
               a(2) = (  -0.500000   0.866025   0.000000 )  
               a(3) = (   0.000000   0.000000   1.629900 )

Which means that (looking only at the xy coordinates):

a1 = (5.9612, 0) bohr = (3.15453, 0) ang
a2 = (-2.9806, 5.162548) bohr = (-1.577266, 2.731903) ang

And if you consider (1.581912, -0.9127640) -a1 + a2 you get (-3.149884, 1.819139) ang that is quite close to the centre of WF centre and spread 9 ( -3.150453, 1.818202, -0.575130 ).

Therefore I'm tempted to say that the WFs are the same (within numerical noise, and except for reordering).

Would you agree? If so, can you close the issue? Otherwise, let us know why and we can continue the discussion.

Hello Giovanni,

Yes you can close the issue. The WFs are the same.

In my opinion the numerical noise is much bigger than it should be (only 3 significant digits). I think I tried with very low threshold for Wannierization. The only difference is the number of cores. Most quantities typically have 10+ significant digits when changing the number of cores within the same machine with the same compiler.

However, if you think its safe, please disregard this issue.

Best, Samuel

wannier-developers / wannier90

Possible bug in Wannier90 (v3.0.0) - Results depend on number of cpu used for DFT calculation. #291