nooreendabbish / Traffic

JSM 2016 GSS Data Challenge
1 stars 2 forks source link

Boosting prediction in the presence of missing data (EM, gradient descent) #12

Open PatrickCoyle opened 8 years ago

PatrickCoyle commented 8 years ago

At the moment, we are just using na.exclude and whatver method is inherent to svyglm to handle missing data. There are a number of ways that we can handle these without using imputation (or without imputing further!). These include:

  1. EM algorithm. This can be done in glm by using probit (transform to normal CDF) instead of logit. Not sure if it can be done with logit.

http://gallery.rcpp.org/articles/EM-algorithm-example/

  1. Lucas used a gradient descent algorithm to boost prediction for a table with missing data in Deep's course. I've asked him for a copy to see if we can use it here.

EDIT: I also stumbled across this list of the 32 "most important" algorithms with brief descriptions. Thought it was cool and helpful:

http://www.koutschan.de/misc/algorithms.php

PatrickCoyle commented 8 years ago

EDIT 6/2/16 @ 2:49 PM

R function ready to go for logistic EM: http://artax.karlin.mff.cuni.cz/r-help/library/BayesLogit/html/logit.EM.html

Here is a paper on EM algorithm for logistic regression:

https://arxiv.org/pdf/1306.0040.pdf

EDIT 3:06 PM: I don't think the logit.EM function is what we are looking for. It doesn't look like it can handle NA's. See DrowsyDrivers/programs/R/Patrick/logit.EM.test.R for a working example.

EDIT 3:12 PM: Another academic citation for the algorithm: MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM Author(s): DEMPSTER AP, LAIRD NM, RUBIN DB Source: JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL 39 (1): 1-38 1977

EDIT 3:14 PM: prelim.cat -> em.cat from the {cat} packages might help us. THere is also an imp.cat function for imputing the data as well. I am not sure if this is what we are looking for, since this does not involve regressing to a response -- it seems to just look at predictors with mossing data and gives posterior mean estimates. http://finzi.psych.upenn.edu/library/cat/html/em.cat.html https://cran.r-project.org/web/packages/cat/cat.pdf

Here is a doc with a good succinct R code example of EM+regression at the bottom of page 2. If we can find the full lecture notes for this course, which appears to be fully committed to the EM algorithm, then we might find something even more directly related to our problem: http://www.maths.manchester.ac.uk/~jy/em/EM_notes_week12.pdf

PatrickCoyle commented 8 years ago

Here is a link to the gradient descent algorithm that Lucas shared (he used this in STAT 9190 to handle missing data):

https://www.dropbox.com/home/DrowsyDrivers/programs/R/Lucas

chenchen715 commented 8 years ago

I just saw the folder up on our dropbox sharing folder. Did Lucas also send some description on this code/methods?

PatrickCoyle commented 8 years ago

No but he told me that I am free to call him and discuss. I think we can assume that applies to everyone. I have a decent feel for how it works, but it's unclear how he came up with the prediction term in the loss function and the gradient with respect to beta.

The most important thing to understand is the opt () function, which does optimization (with gradient descent as an option). He uses this function to minimize his user-defined j_cost function by defining the gradient in terms of a user defined function called grr(). Everything else in the script is pretty much just preprocessing and train/test splitting. His X matrix is a recoded predictor matrix with zero in place of NA. His R matrix is a binary matrix indicating whether the corresponding element of X was originally an NA. Clever stuff.