Closed jjgoings closed 3 years ago
I think it might be my NumPy failing during diagonalizations. Doing pytest --verbose --pyargs numpy
=============================================== short test summary info ================================================
FAILED core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionErr...
FAILED linalg/tests/test_linalg.py::TestSVDHermitian::test_herm_cases - AssertionError: In test case: <LinalgCase: hc...
FAILED linalg/tests/test_linalg.py::TestSVDHermitian::test_generalized_herm_cases - AssertionError: In test case: <Li...
FAILED linalg/tests/test_linalg.py::TestPinvHermitian::test_herm_cases - AssertionError: In test case: <LinalgCase: h...
FAILED linalg/tests/test_linalg.py::TestPinvHermitian::test_generalized_herm_cases - AssertionError: In test case: <L...
FAILED linalg/tests/test_linalg.py::TestEighCases::test_herm_cases - AssertionError: In test case: <LinalgCase: hcsin...
FAILED linalg/tests/test_linalg.py::TestEighCases::test_generalized_herm_cases - AssertionError: In test case: <Linal...
============ 7 failed, 15420 passed, 218 skipped, 20 xfailed, 3 xpassed, 251 warnings in 164.95s (0:02:44) =============
So if my NumPy linalg.eigh
is failing, it makes sense all the tests with SCF (and postSCF) would fail, but the general integral tests (such as testing Boys function and S, V, T integrals in tests/test0012.py
and tests/test0013.py
) pass.
Hey Josh - I don't know why I ended up here (I guess I clicked a few times from the FQE repo) but just two cents: numpy.eigh is a wrapper for dsyevd (divide-and-conquer), which is faster but inherently unstable. For example, the reference implementation states
The divide and conquer algorithm makes very mild assumptions about floating point arithmetic. It will work on machines with a guard digit in add/subtract, or on those binary machines without guard digits which subtract like the Cray X-MP, Cray Y-MP, Cray C-90, or Cray-2. It could conceivably fail on hexadecimal or decimal machines without guard digits, but we know of none.
To test what you are suspecting, you could try directly calling dsyev from scipy
eig, vec, _ = scipy.linalg.lapack.dsyev(data)
I believe the dsyev function uses the Housholder transformation and it is deterministic (and will not fail). I have encountered in some cases where the condition number is really bad and numpy.eigh fails.
@shiozaki Hey Toru - not sure how you got here either, but very glad you did! You are 100% correct. And since scipy's eigh
doesn't default to divide-and-conquer, swapping numpy.linalg.eigh
for scipy.linalg.eigh
is an easy fix for now. Thanks for the insight!
I ran into this same issue (M1, Python 3.9) using np.linalg.eigh
, where switching to scipy.linalg.eigh
fixed it.
Looking deeper into why, I found that on the M1, np.linalg.eigh
is returning non-unit eigenvectors and this throws off subsequent computations. As such, I was able to fix my code and still use np.lingalg.eigh
by explicitly normalizing the eigenvectors before using them.
Just figured I'd comment in case you are partial toward using np.linalg.eigh
and your problem too depends on unit length eigenvectors. I may look deeper into why the heevd
function (what np.linalg.eigh
uses) gives different results on the M1 compared to any non-M1.
Interesting, thanks for your comment @ThomasMerkh!
I was chatting with someone about this, and it may be related to what Toru mentioned about not error handling correctly. If it somehow fails, then it won't normalize properly. Not sure what's up with M1 -- I'm pretty ignorant about what's different. But you seem to be getting correct eigenvalues, so not sure what accounts for this.
Not sure if it is related to https://github.com/numpy/numpy/issues/19472
Compiling as usual, no apparent errors, but the tests fail miserably on Apple M1 with Python 3.9.
SCF does not converge or converges to very wrong answers, e.g. the
sample-input.py
case of STO-3G water: