Closed lindechen closed 5 years ago
@lindechen I just realized I didn't see you partition the data into train and test; was this done elsewhere? I know you have a CV step in there but I wonder how we are going to compare the models (logit VS xgboost). If you want, and it wasnt done, I can partition that data once the PR is in from #12
@jcochanc No, there wasn't a train/test set and if you could create one, that would be great!
For my part, I can then perform CV on the training set and regress on the test set. That way we can compare the base model(s) with XGBoost.
Okay I’ll make an issue for it and the. Have the done as soon as the PR is approved for Carolina.
Okay I’ll make an issue for it and the. Have the done as soon as the PR is approved for Carolina.
Great!
I submitted updated lasso.R
and logit.R
files that:
educ_cat
on the test set with and without the alternative index.Please review this PR.
The base_model folder contains two files:
lasso.R
andlogit.R
. There are two outcome variables being examined:grad
- a continuous variable (see codebook) andeduc_cat
- a categorical variable with 4 levels.lasso.R
- performs LASSO penalty regression on demographic variables chosen by @CarolinaVelasco for variable selection.logit.R
- performs linear regression on the 1) chosen demographic variables fromlasso.R
and 2) housing quality index created by @CarolinaVelasco ongrad
; performs multinomial logistic regression on the 1) chosen demographic variables and 2) housing quality oneduc_cat
.The 6 components used to derive the index are not used.
Please review this pull request. Thank you.