jcochanc / AHS

The American Housing Survey (AHS) has data on housing characteristic and demographic characteristics of householders. Our goal is to use the tools of PHP 2650 to develop a study question and examine the data.
0 stars 0 forks source link

Feature/logistic regression #13

Closed lindechen closed 5 years ago

lindechen commented 5 years ago

The base_model folder contains two files: lasso.R and logit.R. There are two outcome variables being examined: grad - a continuous variable (see codebook) and educ_cat - a categorical variable with 4 levels.

The 6 components used to derive the index are not used.

Please review this pull request. Thank you.

jcochanc commented 5 years ago

@lindechen I just realized I didn't see you partition the data into train and test; was this done elsewhere? I know you have a CV step in there but I wonder how we are going to compare the models (logit VS xgboost). If you want, and it wasnt done, I can partition that data once the PR is in from #12

lindechen commented 5 years ago

@jcochanc No, there wasn't a train/test set and if you could create one, that would be great!

For my part, I can then perform CV on the training set and regress on the test set. That way we can compare the base model(s) with XGBoost.

jcochanc commented 5 years ago

Okay I’ll make an issue for it and the. Have the done as soon as the PR is approved for Carolina.

lindechen commented 5 years ago

Okay I’ll make an issue for it and the. Have the done as soon as the PR is approved for Carolina.

Great!

lindechen commented 5 years ago

I submitted updated lasso.R and logit.R files that:

Please review this PR.