Closed vanAmsterdam closed 4 years ago
I think it is really nice to have a different tutorial for this. For those who want to contribute one, here is an approach, which is quite similar to the approach in Bayesian imputation tutorial.
.mask(False)
x_obs != nan
), we mask out all enumerated values which are different from the observed value by setting their log probabilities -inf
. In other words, we won't count enumerated values that are different from observed values.x = sample('x', dist.Categorical(probs).mask(False))
log_prob = dist.Categorical(probs).log_prob(x)
# mask out values which are different from x_obs
log_prob[(x_obs != nan) & (x != x_obs)] = -inf
numpyro.factor('x_obs', log_prob)
For those who are interested in, please ping me for any question that you have.
I've created a pull request with an initial version (#730). In addition to showcasing this method for automatically enumerating missing covariates, it discusses several forms of missing data and how to handle them
Thanks for addressing this issue, @vanAmsterdam!
since numpyro supports enumerating discrete latent variables, imputing missing values for discrete covariates should be a possibility (which makes numpyro suitable for many more applied projects!)
Since array shapes will be altered when using parallel enumeration it is not directly evident how to adapt the continuous imputation example to discrete covariates, an example may be helpful