Closed sponce24 closed 5 years ago
I would have thought that this is a bug in pw2wannier90, not wannier90 itself.
Comparing the results of the spread in the initial state is more informative than the final state. Can you show these.
Also, if you run these tests, compiling pwscf with a different compiler / maths libraries, do you get the same results, or are they different again.
Very unusual.
pw2wannier90 doesn't care about (and know) how many cores we used in the steps of scf and nscf, especially in the case of turning on wf_collect. Also in both of your two examples, you ran pw2wannier90.x and wannier90.x serially.
Of course, since it is possible to have different phases in wavefunctions between your two cases, the inputs to pw2wannier90 would be different, thereby leading to different amn and mmn. But I am very surprising at the fact that these difference would lead to the meaningful (big) difference in wannierization.
PS) I guess that you obtained the same wannier90.win in your two cases after performing ~/program/q-e-sponce24/bin/wannier90.x -pp gan.win .
Hello Jonathan and Hyungjun,
Thank you both for your replies.
1) I agree, it can be a bug in pw2wannier90.
Indeed the initial state are also different (calculation done with gfortran 7.4 + openmpi 1.10.7) For 1 cpu:
Initial State
WF centre and spread 1 ( 2.004563, 1.116077, -0.046001 ) 2.05136658
WF centre and spread 2 ( 1.596951, 0.641313, -0.728929 ) 1.87481225
WF centre and spread 3 ( 0.975503, 1.264678, -0.728444 ) 1.76263841
WF centre and spread 4 ( 1.703538, 0.802870, -0.045137 ) 2.09152259
WF centre and spread 5 ( -0.126384, 1.929003, 2.525647 ) 2.09150232
WF centre and spread 6 ( 0.601839, 1.467211, 1.842267 ) 1.76260995
WF centre and spread 7 ( -0.019685, 2.090567, 1.841889 ) 1.87482760
WF centre and spread 8 ( -0.427321, 1.615967, 2.524786 ) 2.05134769
WF centre and spread 9 ( 1.579534, 0.911407, -0.149793 ) 8.25291312
WF centre and spread 10 ( -0.002016, 1.820318, 2.421145 ) 8.25340333
Sum of centres and spreads ( 7.886521, 13.659412, 9.457430 ) 32.06694384
For 2 cpu:
Initial State
WF centre and spread 1 ( 2.004420, 1.116143, -0.045956 ) 2.05127880
WF centre and spread 2 ( 1.597001, 0.641224, -0.728991 ) 1.87475892
WF centre and spread 3 ( 0.975535, 1.264686, -0.728392 ) 1.76271424
WF centre and spread 4 ( 1.703512, 0.802841, -0.045182 ) 2.09165108
WF centre and spread 5 ( -0.126364, 1.929003, 2.525622 ) 2.09157657
WF centre and spread 6 ( 0.601818, 1.467246, 1.842318 ) 1.76268398
WF centre and spread 7 ( -0.019714, 2.090586, 1.841850 ) 1.87476970
WF centre and spread 8 ( -0.427246, 1.615888, 2.524811 ) 2.05130419
WF centre and spread 9 ( 1.579520, 0.911300, -0.149789 ) 8.25300755
WF centre and spread 10 ( -0.002020, 1.820388, 2.421129 ) 8.25350185
Sum of centres and spreads ( 7.886461, 13.659306, 9.457420 ) 32.06724689
To answer your second question, I did the same calculation on a different machine using intel 17+impi:
1 cpu:
Initial State
WF centre and spread 1 ( 2.004563, 1.116077, -0.046001 ) 2.05136658
WF centre and spread 2 ( 1.596951, 0.641313, -0.728929 ) 1.87481225
WF centre and spread 3 ( 0.975503, 1.264678, -0.728444 ) 1.76263841
WF centre and spread 4 ( 1.703538, 0.802870, -0.045137 ) 2.09152259
WF centre and spread 5 ( -0.126384, 1.929003, 2.525647 ) 2.09150232
WF centre and spread 6 ( 0.601839, 1.467211, 1.842267 ) 1.76260995
WF centre and spread 7 ( -0.019685, 2.090567, 1.841889 ) 1.87482760
WF centre and spread 8 ( -0.427321, 1.615967, 2.524786 ) 2.05134769
WF centre and spread 9 ( 1.579534, 0.911407, -0.149793 ) 8.25291312
WF centre and spread 10 ( -0.002016, 1.820318, 2.421145 ) 8.25340333
Sum of centres and spreads ( 7.886521, 13.659412, 9.457430 ) 32.06694384
2 cpu:
Initial State
WF centre and spread 1 ( 2.004420, 1.116143, -0.045956 ) 2.05127880
WF centre and spread 2 ( 1.597001, 0.641224, -0.728991 ) 1.87475892
WF centre and spread 3 ( 0.975535, 1.264686, -0.728392 ) 1.76271424
WF centre and spread 4 ( 1.703512, 0.802841, -0.045182 ) 2.09165108
WF centre and spread 5 ( -0.126364, 1.929003, 2.525622 ) 2.09157657
WF centre and spread 6 ( 0.601818, 1.467246, 1.842318 ) 1.76268398
WF centre and spread 7 ( -0.019714, 2.090586, 1.841850 ) 1.87476970
WF centre and spread 8 ( -0.427246, 1.615888, 2.524811 ) 2.05130419
WF centre and spread 9 ( 1.579520, 0.911300, -0.149789 ) 8.25300755
WF centre and spread 10 ( -0.002020, 1.820388, 2.421129 ) 8.25350185
Sum of centres and spreads ( 7.886461, 13.659306, 9.457420 ) 32.06724689
In other words, its not compiler or machine dependent at all (good news) but clearly CPU number dependent. Therefore pointing toward a bug.
2) I indeed obtain the same .win file after doing the -pp
Best, Samuel
Dear Samuel:
I am very sorry, but if you refer to the files in q-e/test-suite/epw_mob_polar,
could you change diago_thr_init (currently, 1.0e-4) with much lower value in nscf.in? In nscf calculations, iterative diagonalization continues up to the value of diago_thr_init differently from scf calculations.
I guess that in your two cases there is rather large phase difference due to the underconverged wave functions.
PS) I still think that this behaviour is not related to pw2wannier90. pw2wannier90 just takes inputs such as wave functions and eigenvalues from nscf calculations and generally, we can have different phases in wave functions obtained from two different situations (different # of cores and different parallelisation strategies, etc). From different inputs to pw2wannier90.x, we can have different amn and mmn files. But I believe that usually these minor differences should not lead to the meaningful difference in wannierization step.
Now I guess that the difference just comes from the underconverged wave functions rather than phase difference I mentioned above. Since your wave functions would be underconverged, we don't need to mention phase difference. And I guess that phase difference itself might not lead to the meaningful difference in wannierization.
Hello,
The default QE value for diago_thr_init is 1.0e-2 https://www.quantum-espresso.org/Doc/INPUT_PW.html#diago_thr_init so I though that 2 order of magnitude lower should be fine.
In addition for nscf, this is automatically lowered by (N elec)/10.
In any case, I re-did the calculation with much lower parameters (btw, such small parameter cannot work for more complex materials, it will never reach convergence):
&electrons
diagonalization='david'
mixing_beta=0.7
conv_thr=1.0d-14
diago_thr_init = 1.0e-10
diago_full_acc = .true.
both in scf.in and nscf.in and I get:
1 cpu:
Final State
WF centre and spread 1 ( -3.148651, 1.820770, -0.569796 ) 0.91108773
WF centre and spread 2 ( 1.577281, 0.911171, -1.036768 ) 1.32708711
WF centre and spread 3 ( -0.005887, 1.820744, -0.569834 ) 0.91103352
WF centre and spread 4 ( 1.577046, 0.910973, 0.453002 ) 1.38699911
WF centre and spread 5 ( 0.000219, 1.820931, 3.023787 ) 1.38699679
WF centre and spread 6 ( 1.583151, 0.911161, 2.000951 ) 0.91103581
WF centre and spread 7 ( -0.000016, 1.820733, 1.534017 ) 1.32708482
WF centre and spread 8 ( -1.583143, 0.911134, 2.000989 ) 0.91108542
WF centre and spread 9 ( 1.577259, -0.908191, -0.617223 ) 0.87902619
WF centre and spread 10 ( 3.154535, -1.823714, 1.953562 ) 0.87902852
Sum of centres and spreads ( 4.731795, 8.195713, 8.172686 ) 10.83046501
2 cpu:
Final State
WF centre and spread 1 ( -3.148666, 1.820824, -0.569797 ) 0.91109236
WF centre and spread 2 ( 1.577290, 0.911176, -1.036769 ) 1.32708652
WF centre and spread 3 ( -0.005893, 1.820688, -0.569833 ) 0.91102961
WF centre and spread 4 ( 1.577050, 0.910974, 0.453002 ) 1.38699814
WF centre and spread 5 ( 0.000214, 1.820930, 3.023787 ) 1.38699579
WF centre and spread 6 ( 1.583157, 0.911221, 2.000952 ) 0.91103149
WF centre and spread 7 ( -0.000025, 1.820728, 1.534016 ) 1.32708422
WF centre and spread 8 ( -1.583127, 0.911078, 2.000988 ) 0.91109012
WF centre and spread 9 ( 1.577304, -0.908185, -0.617223 ) 0.87902740
WF centre and spread 10 ( 3.154488, -1.823722, 1.953563 ) 0.87903008
Sum of centres and spreads ( 4.731792, 8.195713, 8.172686 ) 10.83046573
So it seems better but still not the same.
Moreover, in the case of 1 core, it converges in 300 cycle while in the case of 2 core it converges in 200 cycles (I'm printing every 100 cycle so it might not be that different but for sure not the same number).
Its a bit scary how different the wannier center are located with relatively well converged WF.
Edit: I increased ecut from 40 Ry to 60 Ry and
diagonalization='david'
mixing_beta=0.7
conv_thr=1.0d-14
diago_thr_init = 1.0e-14
diago_full_acc = .true.
With that I get exactly the same initial state for both 1 cpu and 2 cpu (good):
Initial State
WF centre and spread 1 ( 2.008699, 1.117420, -0.045903 ) 2.09460756
WF centre and spread 2 ( 1.587402, 0.636731, -0.753712 ) 1.94454794
WF centre and spread 3 ( 0.953027, 1.274372, -0.752727 ) 1.81871493
WF centre and spread 4 ( 1.712082, 0.804409, -0.044977 ) 2.13544759
WF centre and spread 5 ( -0.134817, 1.927495, 2.525808 ) 2.13544480
WF centre and spread 6 ( 0.624238, 1.457532, 1.818059 ) 1.81871446
WF centre and spread 7 ( -0.010137, 2.095173, 1.817073 ) 1.94454491
WF centre and spread 8 ( -0.431434, 1.614484, 2.524883 ) 2.09460514
WF centre and spread 9 ( 1.579325, 0.911197, -0.204959 ) 8.59629099
WF centre and spread 10 ( -0.002060, 1.820707, 2.365826 ) 8.59628866
Sum of centres and spreads ( 7.886324, 13.659521, 9.249372 ) 33.17920698
but the final state is still significantly different: 1 core
Final State
WF centre and spread 1 ( -3.156012, 1.822651, -0.635723 ) 0.86453635
WF centre and spread 2 ( 1.577052, 0.910865, -1.104090 ) 1.34507774
WF centre and spread 3 ( -0.004172, 1.822953, -0.575365 ) 0.89135920
WF centre and spread 4 ( 1.581912, -0.912764, -0.575128 ) 0.89120731
WF centre and spread 5 ( 3.156076, -1.822574, 1.935062 ) 0.86453553
WF centre and spread 6 ( 1.581380, 0.908835, 1.995419 ) 0.89136493
WF centre and spread 7 ( 0.000220, 1.821048, 1.466696 ) 1.34507528
WF centre and spread 8 ( 0.000813, 1.820761, 3.008186 ) 1.42944321
WF centre and spread 9 ( 1.576458, 0.911152, 0.437401 ) 1.42944525
WF centre and spread 10 ( -1.581885, 0.912843, 1.995659 ) 0.89120546
Sum of centres and spreads ( 4.731843, 8.195768, 7.948116 ) 10.84325027
2 cores:
Final State
WF centre and spread 1 ( -3.156492, 1.821751, -0.635724 ) 0.86453576
WF centre and spread 2 ( 1.576953, 0.910679, -1.104090 ) 1.34507751
WF centre and spread 3 ( -0.003451, 1.824177, -0.575363 ) 0.89135445
WF centre and spread 4 ( 1.576411, 0.911063, 0.437401 ) 1.42944538
WF centre and spread 5 ( 0.000839, 1.820799, 3.008186 ) 1.42944337
WF centre and spread 6 ( 1.581027, 0.908197, 1.995417 ) 0.89136966
WF centre and spread 7 ( 0.000283, 1.821142, 1.466696 ) 1.34507543
WF centre and spread 8 ( -1.581616, 0.913348, 1.995660 ) 0.89119937
WF centre and spread 9 ( -3.150453, 1.818202, -0.575130 ) 0.89121621
WF centre and spread 10 ( 3.156361, -1.822137, 1.935062 ) 0.86453568
Sum of centres and spreads ( -0.000137, 10.927221, 7.948115 ) 10.84325281
Could you confirm that I can obtain your inputs in q-e/test-suite/epw_mob_polar in gitlab?
Also in all your tests, did you perform serial runs of both pw2wannier90.x and wannier90.x? That is, you just changed the number of cores (1 and 2 cores) only in scf and nscf calculations?
Hello,
No I used a pure Wannier90 version with slightly different parameters. I sent you all the input file at hyungjun.lee@epfl.ch Is that your correct email address ?
Best wishes, Samuel
I left EPFL last year, but for the moment I can access to this email account. I will check your email now. But, perhaps I can get back to you tomorrow; Now it is late evening here.
Hi, are we sure the results are really different?
Here is a diff I did, after reordering a bit the WFs.
All WF centres and spreads seem to be the same at least for ~3 significant digits (ok, it's not perfect, but also the Wannierisation has a threshold and probably one could push a bit more it in your simulations).
The only exception are the centres for the line
WF centre and spread 4 ( 1.581912, -0.912764, -0.575128 )
vs
WF centre and spread 9 ( -3.150453, 1.818202, -0.575130 )
However, from the reference output file on the q-e test suite we have:
celldm(1)= 5.961200 celldm(2)= 0.000000 celldm(3)= 1.629900
celldm(4)= 0.000000 celldm(5)= 0.000000 celldm(6)= 0.000000
crystal axes: (cart. coord. in units of alat)
a(1) = ( 1.000000 0.000000 0.000000 )
a(2) = ( -0.500000 0.866025 0.000000 )
a(3) = ( 0.000000 0.000000 1.629900 )
Which means that (looking only at the xy coordinates):
a1 = (5.9612, 0) bohr = (3.15453, 0) ang
a2 = (-2.9806, 5.162548) bohr = (-1.577266, 2.731903) ang
And if you consider (1.581912, -0.9127640) -a1 + a2
you get
(-3.149884, 1.819139) ang
that is quite close to the centre of WF centre and spread 9 ( -3.150453, 1.818202, -0.575130 )
.
Therefore I'm tempted to say that the WFs are the same (within numerical noise, and except for reordering).
Would you agree? If so, can you close the issue? Otherwise, let us know why and we can continue the discussion.
Hello Giovanni,
Yes you can close the issue. The WFs are the same.
In my opinion the numerical noise is much bigger than it should be (only 3 significant digits). I think I tried with very low threshold for Wannierization. The only difference is the number of cores. Most quantities typically have 10+ significant digits when changing the number of cores within the same machine with the same compiler.
However, if you think its safe, please disregard this issue.
Best, Samuel
Dear Wannier developers,
There might be a bug related to the number of core the DFT quantities were produced with. In the case of non-cubic (here hexagonal GaN) materials, the wannier90 v3 code produces different results depending on the number of cores pw.x was used for (using wf_collect). Notably I used "-npool" for k-point parallelization. This is a requirement of EPW.
I first encountered this issue while creating a new test for the QE test-farm. The test can be found using the developer GitLab version of QE in test-suite/epw_mob_polar. The error can be reproduced by simply running the test in sequential or parallel. At present, I have disable the check so that it can past the nightly.
Initially, we though that this issue might be related with the w90 library version but it turns out it is also present in wannier.
The bug. We are using the same input files. If we do:
we get Final State
Instead if we do:
we get:
The difference are relatively small but clearly signal a bug. Also such differences can lead to larger difference in EPW quantities.
If you have any idea what can be responsible for such differences, please let me know. Thanks, Samuel