new step on predicative ml

rebeccajohnson88 / qss20_s21_proj

Repo for DOL Summer Data Challenge on equity in H-2A oversight

Creative Commons Zero v1.0 Universal

2 stars 2 forks source link

Things I'd add:

[x] Make sure to separate into 80% train; 20% test before imputation
[x] Update script to add in closer to real data: clean/jobs_formod.csv (see slack message)
[x] I'd avoid onehotencoder and instead use pd.get_dummies --- definitely avoid handcoding and pd.get_dummies should work if you feed it a list of columns
[ ] i'd separate into two scripts: (1) preprocess that writes four objects: (1) training matrix features; (2) test matrix features; (3) training label; (4) test label and (2) modeling/evaluation script --- the latter should store models in a list and iterate over them

rebeccajohnson88 / qss20_s21_proj