ADVI gives incorrect means for Skew MVN model

skoval commented 3 years ago

I am working on a problem that has a multivariate outcome with negative skew. It seemed like a skew MVN model could be a good choice so I've been exploring that. In some initial simulation testing, I was surprised to find that the posterior means in ADVI are way off. I was aware that mean-field based ADVI can often underestimate the variance of rvs but is usually ok for posterior means. It looks like (sadly, for me) this problem may be one of those exceptions. And I've tested full-rank ADVI and see the same behavior.

Sorry that I don't have much idea of a solution here. But I wanted to share it, in case this example might help with ongoing VI development. Or, in the best case, someone might see how a change in my model specs could address the issue.

Skew MVN

Here is my setup and simulated data:

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import theano as tt
import time
import os
import seaborn as sns; sns.set_context('notebook')

RANDOM_SEED = 1234567
np.random.seed(RANDOM_SEED)

def alpha_to_delta(alpha, Omega):
    return np.dot(Omega, alpha) / np.sqrt(1 + np.dot(alpha.T, np.dot(Omega, alpha)))

def omega_star(delta, Omega):
    Omega0 = np.hstack(([1], delta))
    Omega1 = np.c_[delta.T, Omega]
    return np.vstack((Omega0, Omega1))

def rskewmvn(n, mu, alpha, Omega):
    delta = alpha_to_delta(alpha, Omega)
    Omega_star = omega_star(delta, Omega)
    d = len(delta) + 1
    Y = np.random.multivariate_normal(mean = np.zeros(d), cov = Omega_star, size = n)
    for x in np.arange(1,d):
        Y[:,x] = np.where(Y[:,0] < 0, -1., 1.) * Y[:,x] + mu[x-1]
    return Y[:,np.arange(1,d)]

dims = 3
mu = np.zeros(3)
alpha = np.array([-10., 0., 0.])
Omega = np.array([[1.0, 0.5, -0.1],[0.5, 1.0, -0.3], [-0.1, -0.3, 2.0]])
Y = rskewmvn(1000, mu, alpha, Omega)
alpha_to_delta(alpha, Omega) * np.sqrt((2 / np.pi)) # True mean

array([-0.79392481, -0.39696241,  0.07939248])

sns.pairplot(pd.DataFrame(Y))

Skew MVN Model

with pm.Model() as skewmvn:
    chol, corr, stds = pm.LKJCholeskyCov(
        "chol", n = 3, eta=2.0, sd_dist=pm.Exponential.dist(1.0), compute_corr=True
    )  
    loc = pm.Normal("loc", 0.0, 1.0, shape=3)
    alpha = pm.Normal("alpha", 0.0, 1.0, shape=3)
    cov = chol.dot(chol.T)
    delta = pm.Deterministic("delta", (tt.tensor.dot(alpha, cov)) / pm.math.sqrt(1 + tt.tensor.dot(tt.tensor.dot(alpha, cov), alpha)))
    Omega = pm.Deterministic("Omega", tt.tensor.slinalg.cholesky(cov - delta[:,None] * delta[None, :]))
    xstar = pm.Normal("xstar", 0.0, 1.0, shape = len(Y))
    obs = pm.MvNormal("obs", mu = loc + delta[None, :] * pm.math.abs_(xstar)[:,None], chol = Omega, observed = Y)

with skewmvn:
    inference = pm.ADVI()
    approx = pm.fit(100000, method=inference, callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)])

plt.plot(approx.hist)
plt.ylim([0, 5e4])

The ELBO trace appears to have converged but not to the true target given the mean summary below.

means = approx.bij.rmap(approx.mean.eval())
means['alpha'], means['loc']

(array([-0.03334295, -0.00925781, -0.00126642]),
 array([-0.75964889, -0.38220455,  0.09446099]))

with skewmvn: 
    step = pm.NUTS(scaling=approx.cov.eval(), is_cov=True)
    trace = pm.sample(1000, step=step, tune=1000, random_seed=RANDOM_SEED, return_inferencedata = True)
az.summary(trace, var_names = ["alpha", "loc"])

In this case, the shape parameter for the first dimension is still slightly underestimated, but clearly closer to the target than with VI.

Versions and main components

PyMC3 Version: v3.9.3
Theano Version: 1.0.11
Python Version: 3
Operating system: Linux 2.6
How did you install PyMC3: pip3

junpenglao commented 3 years ago

The likelihood you wrote down is not a skewed MVN - it is still a regular MVN - this could be the reason of the bias?

skoval commented 3 years ago

I was trying to use a conditional specification for the skewed MVN, where a skew normal Y with shape alpha and covariance parameter Omega is conditonally MVN,

w ~ N(0, 1) Y | w ~ MVN(loc + abs(w) delta, Omega - delta delta')

as in Prop 1 from Azzalini and Capitanio here. But maybe I'm missing something?

junpenglao commented 3 years ago

I see, I think you can compare x_star to check if the ADVI estimation makes sense.

junpenglao commented 3 years ago

Looking closer at the Prop 1, it seems that it describes a forward simulation method, for running inference we probability need a close form log_prob function.

skoval commented 3 years ago

Exactly. That's why I went to the conditional approach, which should be an equivalent likelihood, just factorized, p(w) p(y | w)

pymc-devs / pymc