kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
310 stars 98 forks source link

SerialEigendecomposition outputs all zeros #905

Closed brian-kelley closed 3 years ago

brian-kelley commented 3 years ago

SerialEigendecomposition is outputting all zeros (observed in the unit test, and an external project that uses it). This was introduced somwhere between early 2020 and release 3.3.1, since this external project only started seeing the bug when updating its snapshot to 3.3.1.

Example output (just printfs inside the unit test functor):

Finished eigendecomp, m = 5
er: 0.000000 0.000000 0.000000 0.000000 0.000000 
ei: 0.000000 0.000000 0.000000 0.000000 0.000000 
UL:
0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 

UR:
0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 

The test doesn't check the output (#904 ), so this wouldn't have been caught by testing.

brian-kelley commented 3 years ago

@vqd8a I came back to look at this and KokkosKernels doesn't contain any implementation of eigendecomposition, it just calls host LAPACK if it's enabled and otherwise silently does nothing. I'll open a PR to at least throw an exception in this case.

brian-kelley commented 3 years ago

Same with the TeamVector version. Until we make our own impl that probably shouldn't exist at all since no TPLs implement a team/block-level version of that yet.

brian-kelley commented 3 years ago

@kyungjoo-kim For the native eigendecomposition impl that was commented out June 2020, what was the failure you saw that made you not want it to be used? This external app has been using it and it's apparently been correct for those inputs. It's really being used as an SVD, so the input A is of the form AA^T (SPD, all real eigenvalues). Maybe your impl is always correct in this case?

kyungjoo-kim commented 3 years ago

@brian-kelley The implementation is not numerically stable. In particular, when the system has a null rank or duplicated eigen values. Maybe your problem is relatively easier. However, you choose an inefficient algorithm. You problem is a symmetric problem. Then, the Hessenberg reduction results in tridiagonal matrix (since it is symmetric, you need just two vectors). The implementation in the KokkosKernels is for unsymmetric problems and it uses the upper Hessenberg form. Also, the code always produce eigen vectors together and this is quite expensive updates if you just need eigen values (singular values).

The reason that I commented it out is because the code is not stable. I don't want to spread the incomplete implementation to users. I did not know that this code is used by another project. The eigen value algorithm is an iterative algorithm and the convergence is detected when the subdiagonals become numerically zeros (substantially smaller than the diagonal part -- eigen values -- so if eigen values are close to zero, the convergence check can be a bit tricky and the code can just stagnate reasonablly close to zero but it does not satisfy the predefined threshold). LAPACK implmentation are very refined and it protects the algorithm from round-off errors (over/under flows). The implementation that I did is for the chemistry project that I am involved and it is too difficult to handle all edge cases (I tried to overcome the numerical issues by using different zero threshold). Your problem seems to be relatively easy as it does not have complex eigen values and it might be a full rank system.

However, SVD is a lot easier and efficient. The eigen solver for unsymmetric problems need to consider complex arithmetics using real values. The SVD does not need such tricks. Also, if what you want is to compute reduced basis, then SVD is the right way to go.

brian-kelley commented 3 years ago

Thanks Kyungjoo. Maybe next week we can talk about implementing a serial SVD in terms of the other batched algos we have.

brian-kelley commented 3 years ago

@vqd8a I'm closing this since it's the same as #873. But I will leave #904 open (the unit test actually verifying the eigendecomposition) since that can be worked on independently of the fallback implementation itself.