non-positive semi definite covariance matrix

dar326 commented 2 years ago

Hello,

Thank you for creating this library and making it available to the community. I have been working through Statistical Rethinking (2nd Edition) using PyMC3. In a desire to stay as true to the original R code base as possible, I have used your quap function wherever it is used in the original textbook R code. This has been working well for me overall but I have recently encountered an issue when calling quap for a particular model in the book. This is a simple linear model that includes index variables to jointly model male and female heights from the Howell dataset (attached). The code to create the model is as follows:

d = pd.read_csv("Data/Howell1.csv"), delimiter=";")
sex = d["male"].values

with pm.Model() as m5_8:
    sigma = pm.Uniform("sigma", 0, 50)
    mu = pm.Normal("mu", 178, 20, shape=2)
    height = pm.Normal("height", mu[sex], sigma, observed=d["height"])

    idata_5_8, _ = quap([sigma, mu])

When approximating the posterior using quap, an error is generated from scipy during the construction of the underlying multivariate normal distribution that approximates the posterior. The error derives from the covariance matrix not being positive semi-definitive with the exact message being:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [20], in <cell line: 3>()
      5 mu = pm.Normal("mu", 178, 20, shape=2)
      6 height = pm.Normal("height", mu[sex], sigma, observed=d["height"])
----> 8 idata_5_8, _ = quap([sigma, mu])

File ~/opt/anaconda3/envs/stat-rethink2-pymc/lib/python3.9/site-packages/quap/quap.py:38, in quap(vars, n_samples)
     36 cov = np.linalg.inv(H)
     37 mean = np.concatenate([np.atleast_1d(map[v.name]) for v in vars])
---> 38 posterior = scipy.stats.multivariate_normal(mean=mean, cov=cov)
     39 draws = posterior.rvs(n_samples)[np.newaxis, ...]
     40 if draws.ndim == 2:

File ~/opt/anaconda3/envs/stat-rethink2-pymc/lib/python3.9/site-packages/scipy/stats/_multivariate.py:360, in multivariate_normal_gen.__call__(self, mean, cov, allow_singular, seed)
    355 def __call__(self, mean=None, cov=1, allow_singular=False, seed=None):
    356     """Create a frozen multivariate normal distribution.
    357 
    358     See `multivariate_normal_frozen` for more information.
    359     """
--> 360     return multivariate_normal_frozen(mean, cov,
    361                                       allow_singular=allow_singular,
    362                                       seed=seed)

File ~/opt/anaconda3/envs/stat-rethink2-pymc/lib/python3.9/site-packages/scipy/stats/_multivariate.py:730, in multivariate_normal_frozen.__init__(self, mean, cov, allow_singular, seed, maxpts, abseps, releps)
    727 self._dist = multivariate_normal_gen(seed)
    728 self.dim, self.mean, self.cov = self._dist._process_parameters(
    729                                                     None, mean, cov)
--> 730 self.cov_info = _PSD(self.cov, allow_singular=allow_singular)
    731 if not maxpts:
    732     maxpts = 1000000 * self.dim

File ~/opt/anaconda3/envs/stat-rethink2-pymc/lib/python3.9/site-packages/scipy/stats/_multivariate.py:162, in _PSD.__init__(self, M, cond, rcond, lower, check_finite, allow_singular)
    160 eps = _eigvalsh_to_eps(s, cond, rcond)
    161 if np.min(s) < -eps:
--> 162     raise ValueError('the input matrix must be positive semidefinite')
    163 d = s[s > eps]
    164 if len(d) < len(s) and not allow_singular:

ValueError: the input matrix must be positive semidefinite

I have confirmed that with the same dataset, the R code works as indicated in the textbook. I was wondering if you have any insight into what might be the difference in the implementation of quap from pym3-quap that might lead to this error? Alternatively, has this error previously been encountered and, if so, any suggestions on a workaround?

I would appreciate any assistance you might be able to provide on this issue.

Best, Darryl

dar326 commented 2 years ago

I was able to resolve this issue by making the prior on mu more flexible. This was necessary because the observed data for this model included non-adults. Closing this issue.

ivanistheone commented 9 months ago

I ran into this error today and managed to fix it by changing the optimization algorithm used by find_MAP ( BFGS instead of the default L-BFGS-B). See https://github.com/ivanistheone/pymc_Rethinking_3/commit/9dc003f2a3336d774fc29fe008b0e67dcb3c6aaf

I chose BFGS to match the algo used in the rethinking R package, cf. https://github.com/rmcelreath/rethinking/blob/f3ac8de0b4bcfabccc67ed033fb81d1873ec755e/R/map-quap.r#L26C50-L26C54

rasmusbergpalm / pymc3-quap

non-positive semi definite covariance matrix #1