pymc-devs / pymc-examples

Examples of PyMC models, including a library of Jupyter notebooks.
https://www.pymc.io/projects/examples/en/latest/
MIT License
263 stars 218 forks source link

Missing data and Bayesian Imputation #500

Closed NathanielF closed 1 year ago

NathanielF commented 1 year ago

A notebook on Missing Data methods and Bayesian imputation

Related to https://github.com/pymc-devs/pymc-examples/issues/461

This notebook aims to showcase methods for imputation of missing data using primarily bayesian methods. We will focus on a dataset which records employee satisfaction metrics drawn from the book Applied Missing Data Analysis. We will demonstrate how FIML and Bayesian imputation methods work using the Multivariate normal distribution differ and we also want to show how approximate the multivariate distribution using the sequential chained equation methods.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

NathanielF commented 1 year ago

I think this is ready for review now. It's quite long and covers a number of approaches to imputation.

(I) We discuss the taxonomies of missing-ness (MCAR), (MAR) and (MNAR). I try to set it up as a prelude to considerations about causal inference.

(ii) FIML and MLE approaches to estimating a multivariate model given missing data (iii) Bayesian imputation of missing values using the multivariate gaussian and the posterior predictive distribution (iv) Two examples of imputation using sequential regression equations

Each of the approaches so far is presented in the Enders book and our estimates match those presented there.

(v) I apply the missing data imputation to hierarchical model and estimate the values of the missing data informed by the structure of "team" clusters in our employee data set. The model is estimated using the blackjax sampler and shows divergences, but converges nicely with good Rhat numbers...,. I use the differences in imputation patterns between the hierarchical model and the simpler regression models to argue for why we need to be aware of heterogenous patterns of imputation and how this is analogous to concerns in causal inference of heterogenous treatment effects.

We finish on a wrap up and celebration of the flexibility of bayesian modelling in an enterprise that has work with confounding and complexity.

review-notebook-app[bot] commented 1 year ago

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2023-01-24T02:41:15Z ----------------------------------------------------------------

The table looks janky. Does it need to be placed in a code block to enforce monospace?


NathanielF commented on 2023-01-24T10:55:40Z ----------------------------------------------------------------

Fair. It was a bit needless. I've taken another approach just adding the patterns of missing-ness as a pandas dataframe: