Bayesian imputation tutorial with discrete covariates

vanAmsterdam commented 4 years ago

since numpyro supports enumerating discrete latent variables, imputing missing values for discrete covariates should be a possibility (which makes numpyro suitable for many more applied projects!)

Since array shapes will be altered when using parallel enumeration it is not directly evident how to adapt the continuous imputation example to discrete covariates, an example may be helpful

fehiepsi commented 4 years ago

I think it is really nice to have a different tutorial for this. For those who want to contribute one, here is an approach, which is quite similar to the approach in Bayesian imputation tutorial.

Mask out the discrete site using .mask(False)
At an observed index (i.e. x_obs != nan), we mask out all enumerated values which are different from the observed value by setting their log probabilities -inf. In other words, we won't count enumerated values that are different from observed values.

x = sample('x', dist.Categorical(probs).mask(False))
log_prob = dist.Categorical(probs).log_prob(x)
# mask out values which are different from x_obs
log_prob[(x_obs != nan) & (x != x_obs)] = -inf
numpyro.factor('x_obs', log_prob)

For those who are interested in, please ping me for any question that you have.

vanAmsterdam commented 4 years ago

I've created a pull request with an initial version (#730). In addition to showcasing this method for automatically enumerating missing covariates, it discusses several forms of missing data and how to handle them

fehiepsi commented 4 years ago

Thanks for addressing this issue, @vanAmsterdam!

pyro-ppl / numpyro

Bayesian imputation tutorial with discrete covariates #726