scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
13.08k stars 5.19k forks source link

ENH: `linalg.cholesky`: document that user is responsible for symmetry check #15012

Closed jecampagne closed 2 months ago

jecampagne commented 3 years ago

Is your feature request related to a problem? Please describe.

Here a snippet that shows the behaviour of scipy.linalg.cholesky which is a bit strange to my point of view

import numpy as np
import scipy

# a 21x21 matrix as a result of a Stochastic Variational Inference optimisation

cov_mtx = np.array([[ 2.94881366e-02,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 5.06586850e-03,  9.36292545e-02,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-1.36204486e-03, -5.76236857e-04,  5.61120132e-02,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-9.86475135e-04,  5.84301279e-05,  1.49339373e-03,
         2.02517656e-02,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-9.70847465e-04,  1.01796366e-04, -1.91880281e-03,
         7.15371049e-03,  1.75530137e-02,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-1.04522499e-03, -9.57936828e-04, -5.22594912e-03,
         8.45534973e-03,  5.95143550e-03,  1.89908684e-02,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-9.34743279e-04, -1.15420101e-03, -6.34441898e-03,
         1.02271820e-02,  6.38034975e-03,  5.02061098e-03,
         2.20252857e-02,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-1.38808792e-03, -1.68906956e-03, -9.07781049e-03,
         1.06555185e-02,  8.91562879e-03,  5.90900396e-03,
         5.06122776e-03,  2.42384712e-02,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 5.48607014e-03, -8.98681878e-05, -1.14970691e-03,
         7.18976890e-05,  2.11196390e-04, -1.91612042e-04,
        -5.69895521e-04, -3.65185117e-04,  1.24535114e-02,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 2.59827661e-03, -1.31582003e-04, -6.58200577e-04,
         1.48970152e-04, -3.91381444e-04, -4.00698325e-04,
        -3.77680096e-04,  1.11862746e-04,  2.18708619e-04,
         1.07535241e-02,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 3.57993574e-03, -4.03650994e-04, -1.41726792e-03,
         3.29283594e-04, -5.05896641e-04, -1.03943338e-03,
         1.18017384e-03, -3.35517894e-04, -4.01761592e-04,
         7.00944847e-04,  1.26503152e-02,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 2.82711224e-03,  1.15854785e-03, -9.68853874e-04,
        -5.79304647e-04,  3.01335925e-04,  2.60539045e-04,
         1.46376306e-03,  7.34105324e-04,  7.26705853e-04,
        -3.58141894e-04,  5.53821664e-04,  1.69079826e-02,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-1.98785690e-02,  8.99134290e-05,  2.08801703e-03,
        -4.82763395e-03,  1.66961129e-02, -3.34119937e-02,
         3.85012698e-03,  1.37558520e-03, -1.27947402e-01,
        -2.28987495e-02,  8.21587519e-03,  2.49074627e-02,
         1.89878248e-01,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-4.94668456e-03,  7.63959837e-03, -4.65494284e-02,
        -6.03455981e-03, -7.69611943e-03, -1.06105933e-02,
        -1.12723549e-02, -5.02414580e-03,  2.36500949e-04,
        -5.76834281e-03, -3.42089267e-03, -2.16095175e-03,
        -9.35370506e-04,  8.18526072e-02,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 5.36201505e-03,  5.86418550e-04, -5.07274283e-03,
        -1.25792770e-03, -4.10997074e-03,  2.96542188e-03,
         1.73831734e-03,  1.09334861e-03, -4.05095193e-02,
         3.99805189e-04, -2.04056188e-04,  2.03327304e-03,
        -4.23544160e-03,  1.32823865e-03,  3.40768872e-02,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-2.20649184e-03, -1.35638832e-03, -4.98629992e-03,
         4.80272838e-04, -2.27082659e-03,  5.40581058e-04,
        -4.18126267e-05, -1.15619171e-03,  3.50855847e-03,
        -3.41982777e-02,  1.60981987e-03,  2.05503026e-03,
        -2.29761698e-03, -2.15157474e-04,  1.02099423e-03,
         2.66228686e-02,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 6.37869243e-04, -2.69353192e-03, -8.26823583e-03,
         2.58919855e-03,  3.13527514e-04, -1.10300185e-03,
         1.64869764e-03,  1.21864219e-03, -7.31707130e-04,
         4.09616320e-04, -1.66828018e-02,  1.83101868e-03,
        -3.98810112e-05, -1.04186819e-03, -2.34233372e-04,
         4.96540609e-04,  1.76176100e-02,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-8.42103546e-04, -1.59227919e-03, -7.29955766e-03,
         2.41766861e-03,  3.73263463e-04, -5.17232220e-04,
        -8.36916808e-05,  1.89187565e-05, -6.43920191e-04,
         4.69738710e-04,  1.53689775e-05, -9.41970171e-03,
        -6.31174921e-04,  3.65579370e-04, -7.93234010e-04,
         2.23520050e-03, -1.66213044e-04,  1.35184361e-02,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [-8.61826215e-03,  1.08922027e-02, -4.58113800e-02,
        -4.68836440e-02,  3.89313020e-04,  1.71379278e-02,
         2.51106148e-02,  3.06694551e-02,  1.29354685e-02,
        -6.05223758e-04,  3.83346094e-03, -2.75744652e-03,
        -1.69380022e-03, -3.98399290e-02, -3.59029370e-03,
        -5.94276478e-03, -7.80543861e-03, -1.47630441e-02,
         9.09363645e-02,  0.00000000e+00,  0.00000000e+00],
       [-1.47458754e-04, -4.11655200e-04, -4.54193905e-03,
        -6.30715479e-03, -3.32516594e-03, -2.01466462e-03,
        -2.26116268e-03, -7.93095443e-04,  1.30032710e-03,
        -6.45144336e-04, -1.01995281e-03, -1.68059965e-04,
        -5.47279705e-05, -2.85222094e-03, -6.46646766e-04,
        -6.44948378e-04, -1.33983063e-03, -2.67590829e-03,
        -1.27281051e-03,  2.08569368e-02,  0.00000000e+00],
       [-1.19868272e-03, -1.44074162e-04, -1.11446937e-02,
        -1.37308118e-02, -1.02134739e-02, -9.57308145e-03,
        -9.62345402e-03, -6.25575561e-03,  6.70128614e-03,
         4.18519278e-03,  2.39682164e-03,  3.25732711e-03,
        -4.68314227e-04, -8.65179140e-03, -5.56607570e-04,
         2.16140664e-03,  6.32600460e-03,  1.73952495e-03,
        -5.27277764e-03, -1.04218200e-01,  6.93402162e-02]])

Performing a Cholesky decomposition

scipy.linalg.cholesky(cov_mtx)

return a matrix wo any complaining.

But, the matrix eigen decomposition clearly shows that the matrix is not positive definite, namely

scipy.linalg.eigvalsh(cov_mtx)

gives

array([-0.07783337, -0.06924233, -0.03830554, -0.01789377, -0.0059395 ,
        0.00183724,  0.00523341,  0.01097751,  0.01314316,  0.01835902,
        0.02327265,  0.03061616,  0.03177506,  0.04952865,  0.05483383,
        0.05734573,  0.09309918,  0.12094973,  0.15347983,  0.15643385,
        0.26808384])

showing negative eigen-values.

Now, what is strange, is that scipy.linalg.cholesky with default args does not rise any Error, while

scipy.linalg.cholesky(cov_mtx, lower=True)

rise an exception leading to the right diagnostics

LinAlgError: 13-th leading minor of the array is not positive definite

Notice, that

CholeskyL, rc = scipy.linalg.lapack.dpotrf(cov_mtx)

leads to rc=0 so also wo mentionning the "not positive definite" status of the matrix.

Describe the solution you'd like.

So, my questions and suggestion: why scipy.linalg.cholesky does not include (on user request) a check of positiveness of the matrix?

As several library wraps the scipy implementation I imagine that it could be difficult to change the API. Now, I guess many people encounter the same problem. I must say that the fact that the decomposition returns a diagonal matrix wo any warning/error-rising makes difficult to debug the other packages, and it is my wish to understand a strange behavior in a Statistical Lib. using Variational Inference that I discover this problem.

Describe alternatives you've considered.

No response

Additional context (e.g. screenshots, GIFs)

No response

ilayn commented 3 years ago

Just by eye-balling, I think your matrix is not symmetric.

ilayn commented 3 years ago

With the new version 1.8 of SciPy, efficient ishermitian and issymmetric functions will be public, and the long term goal is to handle these issues more structurally. However note that using symmetricity exploiting functions necessitate that we only consider one triangular part of the array and such that memory can be saved hence this leaves the burden of data source control on the user.

jecampagne commented 3 years ago

You may be right concerning the symmetric character of the matrix but as I have suggested, this matrix is an intermediate result of a statistical library. So as end-user of this machinery, it is a behavior of a convergence of a loss in a SVI algorithm that I have suspected something strange and I've found this suspicious matrix. So, I have no entry to check this matrix during the process.

May be it would be worth to get lower=True be default which rise the Error...

ilayn commented 3 years ago

What I mean is that lower or upper is not something we can understand at the outset. The user has to provide the lower or upper triangle that the function should consider.

The suspicion should be discussed with the originator of the data or if there is a particular reason the third party software should provide the correct part. We don't have any ways to detect the intention.

If we swap the default then other people who expect the opposite triangle would start complaining. The ideal case is that the supplied matrix is symmetric/Hermitian.

melissawm commented 2 years ago

Maybe the solution could be to improve the documentation to scipy.linalg.cholesky and include a clarification about the user being responsible for this check?