Closed ColinBundschu closed 1 month ago
Is this only happening with cold smearing? If so, this is an unavoidable consequence of the unphysical greater than 1 fillings in this scheme, which conflicts with the variational algorithm.
I missed that you were doing cold smearing previously.
I was not doing cold smearing previously. The same issue happens with Fermi and Gauss smearing.
331_nlpcm_fermi_02.txt
Here is an example with fluid and Fermi 0.02
331_nlpcm_fermi_10.txt
Same calculation but with Fermi 0.1
This is likely than an artifact of the variational fillings algorithm as applied to semiconductors/semimetals with no/low DOS at the Fermi level. Specifically, the issue is that the only sensible preconditioner K that works happens to be an indefinite operator, and hence dagger(g) K g can be less than zero. Then, |grad|_K = sqrt(dagger(g) K g) = nan.
The CG algorithm still appears to chug along just fine, so this is essentially a logging issue rather than an algorithmic one. Will take a look at how to address this most sensibly.
So from my end should I stick with Fermi 0.02 and ignore the nans?
I've pushed a fix to report the signed-sqrt so that these cases show up as a negative grad_K instead of nan. That way you can see the magnitude of grad_k and still use it to gauge convergence, even in the indefinite case. Please check and close if this looks good now.
grad_K is still nan, even with the latest changes
I'm unable to reproduce the issue with the latest code: can you send me a minimal working example that still shows the nan grad_K. (Your log files above were reading a previous state etc., so make sure it is a clean example that I can run from scratch easily. It would also be good if you can find a smaller version of the same system that still has the same issue to ease debugging.)
I will make a minimal example and post back
This turned out to be an overflow error in the number of dimensions of electronic minimize exceeding 2^32-1 (one variable was incorrectly int instead of size_t). Fixed in the latest commit.
(The minimal example would have been enormous :) due to the nature of the bug.)
I pulled latest, and I am still seeing issues with grad_k being Nan
Very strange: I even tested your exact input file with an nvhpc build on Perlmutter and it wasn't NAN. Can you please double check the same input you sent below and confirm the code is compiled with the latest git hash (e155c65)?
Regardless of the size of the smearing and the type, the grad_k is still coming back nan. Note how large I set the smearing to be to demonstrate that it is not simply too small: