Open PatrickCoyle opened 8 years ago
EDIT 6/2/16 @ 2:49 PM
R function ready to go for logistic EM: http://artax.karlin.mff.cuni.cz/r-help/library/BayesLogit/html/logit.EM.html
Here is a paper on EM algorithm for logistic regression:
https://arxiv.org/pdf/1306.0040.pdf
EDIT 3:06 PM: I don't think the logit.EM function is what we are looking for. It doesn't look like it can handle NA's. See DrowsyDrivers/programs/R/Patrick/logit.EM.test.R for a working example.
EDIT 3:12 PM: Another academic citation for the algorithm: MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM Author(s): DEMPSTER AP, LAIRD NM, RUBIN DB Source: JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL 39 (1): 1-38 1977
EDIT 3:14 PM: prelim.cat -> em.cat from the {cat} packages might help us. THere is also an imp.cat function for imputing the data as well. I am not sure if this is what we are looking for, since this does not involve regressing to a response -- it seems to just look at predictors with mossing data and gives posterior mean estimates. http://finzi.psych.upenn.edu/library/cat/html/em.cat.html https://cran.r-project.org/web/packages/cat/cat.pdf
Here is a doc with a good succinct R code example of EM+regression at the bottom of page 2. If we can find the full lecture notes for this course, which appears to be fully committed to the EM algorithm, then we might find something even more directly related to our problem: http://www.maths.manchester.ac.uk/~jy/em/EM_notes_week12.pdf
Here is a link to the gradient descent algorithm that Lucas shared (he used this in STAT 9190 to handle missing data):
I just saw the folder up on our dropbox sharing folder. Did Lucas also send some description on this code/methods?
No but he told me that I am free to call him and discuss. I think we can assume that applies to everyone. I have a decent feel for how it works, but it's unclear how he came up with the prediction term in the loss function and the gradient with respect to beta.
The most important thing to understand is the opt () function, which does optimization (with gradient descent as an option). He uses this function to minimize his user-defined j_cost function by defining the gradient in terms of a user defined function called grr(). Everything else in the script is pretty much just preprocessing and train/test splitting. His X matrix is a recoded predictor matrix with zero in place of NA. His R matrix is a binary matrix indicating whether the corresponding element of X was originally an NA. Clever stuff.
At the moment, we are just using na.exclude and whatver method is inherent to svyglm to handle missing data. There are a number of ways that we can handle these without using imputation (or without imputing further!). These include:
http://gallery.rcpp.org/articles/EM-algorithm-example/
EDIT: I also stumbled across this list of the 32 "most important" algorithms with brief descriptions. Thought it was cool and helpful:
http://www.koutschan.de/misc/algorithms.php