Following the merge into develop of PR #350 we started getting a failure on one of the notebook regression tests. This failure was apparently caused by one of the additional lines added to the tutorial on germ selection in GST-FiducialAndGermSelection.ipynb with this latest update. At first glance this appeared to be restricted to Windows and on python 3.8 and 3.10, but subsequent re-runs of the workflow revealed that the failure was actually apparently non-deterministic. See notebook regression workflow run 53 for more on this. In particular, there are multiple attempts, and for attempts 2-4 only python 3.8 on windows fails these tests (always for the last cell of GST-FiducialAndGermSelection.ipynb with the same ValueError exception raised in the vicinity of line 1125 of germselection.py).

The exception that is being raised here is related to the construction of a projector that should be real-valued, but is apparently failing to be (and by quite a lot, this is not a matter of being just over the testing threshold, as I learned later on in testing). Given that the regression tests take forever to run, and that a bunch of the standard debugging output from pytest doesn't get produced when using nbval for the notebook testing, I turned this cell into a standard unit test. This provided some additional output, but hasn't really provided any insight into the nondeterminism. It is worth noting that when this exception is raised that it appears (based on some spot checking) that the values of the large imaginary projector components detected seem to be the same across the failed runs, independent of python version.

Here is the summary of what I've seen so-far from these unit testing runs (these runs were restricted only to windows to speed things up since I haven't seen strangeness on any other platform yet):

1166 - python 3.9 failed

1167 - python 3.10 failed

1168 - python 3.8 failed

1169 (attempts 1 and 2 pass for mysterious reasons), then Python 3.9 fails on attempt 3.

It is worth noting that I did make some minor changes to the source code in-between these runs, but nothing that should, at least in principle have made any difference to the testing (changes summarized below for reference).

Between #1166 and #1167 I realized that we should really have been testing the magnitude of the complex components of the matrix being tested (to avoid missing large negative terms), so that change was made. In principle, if the behavior were deterministic I would expect this change to have strictly added more failures to the list, not shift them as was observed.
Between #1167 and #1168 I added some additional print statements for logging. Technically the exception raised in #1168 was a NameError and not the aforementioned ValueError, but that is just because of a typo in a printed variable name I added. The fact that we were in this block means the ValueError would have been raised had I not made this mistake.
Between #1168 and #1169 I fixed that typo on the print statement I added.

The output from the third run of #1169 provided a useful clue when it failed. The extra line that was added here was for printing out the inverse of a set of eigenvectors produced by the numpy.linalg.eig function. This inverse looks a bit ugly (20+ order of magnitude swings on the values of the matrix elements) for sure, so it is definitely plausible this is causing bad numerical precision issues. This would also be in line with one of the (few) working theories I had for what might be causing this behavior based on a section of the numpy documentation for eig I read earlier this evening (quoted below):

The array eigenvectors may not be of maximum rank, that is, some of the columns may be linearly dependent, although round-off error may obscure that fact. If the eigenvalues are all different, then theoretically the eigenvectors are linearly independent and a can be diagonalized by a similarity transformation using eigenvectors, i.e, inv(eigenvectors) @ a @ eigenvectors is diagonal.

For non-Hermitian normal matrices the SciPy function scipy.linalg.schur is preferred because the matrix eigenvectors is guaranteed to be unitary, which is not the case when using eig. The Schur factorization produces an upper triangular matrix rather than a diagonal matrix, but for normal matrices only the diagonal of the upper triangular matrix is needed, the rest is roundoff error.

Given this comment from the documentation and the latest observation that the inverse of the eigenvectors matrix does indeed appear to be poorly behaved, maybe this is indeed what is going on? That leaves the question of why this behavior would have unceremoniously reared its head now. Maybe there was some change to the implementation in the lower-level linear algebra package that has made it more likely that that warning in the numpy documentation is actualized? (Though why would this be at seemingly random?) If some change to the linear algebra package were what was happening this would be consistent with the fact that I've been entirely unable to reproduce any of this behavior locally on my machine so far (I probably should have mentioned that I'd tried this locally earlier in this post) since I wouldn't have those changes present on my system. @rileyjmurray, you have significant expertise here, does any of this sound plausible/reasonable to you? Numpy claims to use the _geev LAPACK routine for their eig implementation in case that helps at all on your end.

@sserita and @rileyjmurray, apologies for the novel, but if either of you has some free cycles it'd be great to get some more eyes on this (I'm pretty stuck). Especially since we'd been holding off on merging in some of the other ready PRs while tracking this down. In the meantime I'm going to give the suggestion from the numpy docs a shot and switch to using scipy's implementation of the schur decomposition, which since it promises a set of unitary eigenvectors should sidestep the need to do any non-trivial matrix inversion. Though, even if that somehow works I'll nonetheless still be confused.

I love GitHub Novels! I'll start digging into this.

I've looked at the output from the failing Python 3.9 test you cite above. The matrix of eigenvectors is indeed ill-conditioned. I pasted the results into a local Python script and then checked the condition number.

import numpy as np
wrtEvecs = np.array([[ 1.00000000e+00+0.00000000e+00j, -3.74747320e-34-2.70978897e-41j,
  -3.74747320e-34+2.70978897e-41j, -3.22197060e-34+0.00000000e+00j],
 [ 0.00000000e+00+0.00000000e+00j, -7.07106781e-01+0.00000000e+00j,
  -7.07106781e-01-0.00000000e+00j,  5.22945840e-32+0.00000000e+00j],
 [ 0.00000000e+00+0.00000000e+00j, -7.07106781e-01+5.34130835e-24j,
  -7.07106781e-01-5.34130835e-24j,  5.22945840e-32+0.00000000e+00j],
 [ 0.00000000e+00+0.00000000e+00j, -3.84725876e-81-1.01863535e-57j,
  -3.84725876e-81+1.01863535e-57j,  1.00000000e+00+0.00000000e+00j]])

print(np.linalg.cond(wrtEvecs))   # a whopping 1.99e+17   !!!

That matrix cannot be inverted to any reasonable degree of accuracy. You can try, but the inverse that NumPy hands you will be garbage.

It's hard to say exactly why this happened. However, it's definitely within the realm of "weird things that happen in floating point arithmetic." To me, a clue is that the reported eigenvalues have nonzero but absolutely tiny complex part:

wrtEvals  = np.array([1.+0.00000000e+00j, -1.+9.95238383e-41j, -1.-9.95238383e-41j, 1.+0.00000000e+00j])

In standard double-precision arithmetic the smallest value that is practically distinguished from zero is about 1e-16. So with the complex parts of these eigenvalues being many many orders of magnitude smaller than that, it's clear that's just noise.

Rather than focusing on non-determinism I'd focus on trying to select an alternative linear algebra algorithm to np.linalg.eig (which calls LAPACK geev, as you pointed out). If the matrix wrt is known to be Hermitian then you can call np.linalg.eigh. Otherwise if it's known to be normal (i.e. if wrt @ wrt.T.conj() - wrt.T.conj() @ wrt is close to the zero matrix) then you should call scipy.linalg.schur and extract the diagonal of the triangular matrix.

@coreyostrove: can we be sure that the wrt matrix is indeed a normal operator? If so then you or I can implement this fix in a pretty definitive way. If it's only sometimes a normal operator then we can at least implement a check for normality and use Schur decomposition when that check passes (only resorting to np.linalg.eig when absolutely necessary).

Thanks for the reply, @rileyjmurray. In the context of germ selection the wrt matrix should be normal, but not necessarily hermitian (so eigh wouldn't work). In particular, in the context of germ selection the wrt matrices should correspond to the process matrices for unitary operations. This itself should be a unitary matrix, as given a unitary U the corresponding process matrix (in the standard basis) is U^\dagger \otimes U (and changes of basis shouldn't make a difference here so long as they themselves are unitary, and I don't think we typically encounter non-unitary basis transformations in practice).

That said, I can imagine future scenarios where we might be interested in using this function for constructing twirling superduperoperators for non-unitary process matrices, in which case we'd sometimes have these be non-normal and would need to implement this using eig.

I've implemented the switch to the scipy.linalg.schur in commit 38ca39d, and this seems to have worked (i.e. all of the tests are passing). I've also inspected the output germ sets from the germ selection tutorial notebook and these all still look sensible. Given my comment from the previous paragraph, I think it'd be work adding a check on the commutator of wrt as you suggested (this should be sufficiently cheap) and then branching to either the schur decomposition or the more general eigenvalue decomposition depending on the result. I'll implement this change and then create a PR with these bugfixes/changes.

Amazing!

On Mon, Nov 13, 2023 at 6:19 PM coreyostrove @.***> wrote:

Thanks for the reply, @rileyjmurray https://github.com/rileyjmurray. In the context of germ selection the wrt matrix should be normal, but not necessarily hermitian (so eigh wouldn't work). In particular, in the context of germ selection the wrt matrices should correspond to the process matrices for unitary operations. This itself should be a unitary matrix, as given a unitary U the corresponding process matrix (in the standard basis) is U^\dagger \otimes U (and changes of basis shouldn't make a difference here so long as they themselves are unitary, and I don't think we typically encounter non-unitary basis transformations in practice).

That said, I can imagine future scenarios where we might be interested in using this function for constructing twirling superduperoperators for non-unitary process matrices, in which case we'd sometimes have these be non-normal and would need to implement this using eig.

I've implemented the switch to the scipy.linalg.schur in commit 38ca39d https://github.com/pyGSTio/pyGSTi/commit/38ca39d012ab6aeb5d63cd1934502ae0a8839653, and this seems to have worked (i.e. all of the tests are passing). I've also inspected the output germ sets from the germ selection tutorial notebook and these all still look sensible. Given my comment from the previous paragraph, I think it'd be work adding a check on the commutator of wrt as you suggested (this should be sufficiently cheap) and then branching to either the schur decomposition or the more general eigenvalue decomposition depending on the result. I'll implement this change and then create a PR with these bugfixes/changes.

— Reply to this email directly, view it on GitHub https://github.com/pyGSTio/pyGSTi/issues/369#issuecomment-1809287607, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRLIFC2QEODZ7E6HJY2PDLYEKTHBAVCNFSM6AAAAAA7ISRUXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBZGI4DONRQG4 . You are receiving this because you were mentioned.Message ID: @.***>

Closed with release 0.9.12.

sandialabs / pyGSTi

Nondeterministic Unit Test Failures #369

1166 - python 3.9 failed

1167 - python 3.10 failed

1168 - python 3.8 failed

1169 (attempts 1 and 2 pass for mysterious reasons), then Python 3.9 fails on attempt 3.