pyro-ppl / pyro

Deep universal probabilistic programming with Python and PyTorch
http://pyro.ai
Apache License 2.0
8.57k stars 986 forks source link

MCMC posterior distribution of observations returns the wrong values #1771

Closed FedericoV closed 5 years ago

FedericoV commented 5 years ago

Issue Description

Extracting the parameters from an MCMC trace, and then manually running them through the model gives a different answer than extracting the distribution of the observable variables.

Environment

Code Snippet

I have a model with a multi-variate likelihood that looks like this:

def model(X_train_dict, X_test_dict, y_train):

    mu_train = []
    mu_test = []
    for col_name in target_cols:

        X_train = X_train_dict[col_name]
        X_test = X_test_dict[col_name]
        n = X_train.shape[1]

        beta = pyro.sample('beta_{}'.format(col_name), dist.Normal(loc=torch.zeros(n), scale=torch.ones(n)))
        mu_train.append(X_train @ beta)
        mu_test.append(X_test @ beta)

    mu_train = torch.stack(mu_train, dim=1)
    mu_test = torch.stack(mu_test, dim=1)

    # Likelihood of observing train observations.
    #with pyro.plate("train_obs", size=y_train.shape[0]) as ind:
    n = len(X_train_dict)
    m = 5  # Rank of covariance matrix

    cov_factor = pyro.sample("cov_factor", dist.Normal(torch.zeros((n, m)), torch.ones(1)))
    cov_diag = pyro.sample("cov_diag", dist.HalfCauchy(scale=torch.ones(n)))

    train_obs = pyro.sample("train_obs", 
                            dist.LowRankMultivariateNormal(
                                loc=mu_train,
                                cov_factor=cov_factor,
                                cov_diag=cov_diag), obs=y_train)

If I fit try to sample from the posterior of the model with MCMC doing this:

nuts_kernel = NUTS(model, full_mass=False, step_size=0.1, adapt_step_size=True,
                   jit_compile=True)
mcmc_sampler = MCMC(nuts_kernel, num_samples=500, warmup_steps=1000)
posterior = mcmc_sampler.run(X_train, X_test, y_train)
posterior_forecaster = TracePredictive(model, posterior, 500)

train_pred = posterior_forecaster.run(X_train, X_test, y_train)
train_posterior_draws  = EmpiricalMarginal(train_pred, sites='train_obs')._get_samples_and_weights()[0].detach().cpu().numpy()

And plot the results, I get a completely different answer than if I do:

beta = posterior.marginal(sites=["beta_y0"])
beta = torch.cat(list(beta.support(flatten=True).values()), dim=-1)
y0_train_pred = (X_train['y0'] @ torch.transpose(marginal, 0, 1)).numpy()

The former shows near perfect fit, the second one shows a very poor fit. I simplified the example somewhat - but hopefully the behaviour is clear.

FedericoV commented 5 years ago

I also get the same exact behaviour if I get the posterior distribution directly doing this:

obs_marginal = posterior.marginal(sites=["train_obs"])
obs_marginal = torch.cat(list(obs_marginal.support(flatten=True).values()), dim=-1)
obs_marginal = obs_marginal.numpy()

The posterior distribution of an observed variable seems to be incredibly close to the value of the variable used in training plus the observation noise, not what the model is actually predicting.

fehiepsi commented 5 years ago

@FedericoV It seems to me that the line train_pred = posterior_forecaster.run(X_train, X_test, y_train) should be train_pred = posterior_forecaster.run(X_train, X_test, y_train=None). By setting y_train=None, we tell the model that train_obs is not an observed node.

FedericoV commented 5 years ago

Aha, that worked perfectly, thank you. Now, I need to investigate why NUTS seems to work much worse than ADVI for this problem...