Modeling notes - Githubissues

rebeccajohnson88 / qss20_s21_proj

Repo for DOL Summer Data Challenge on equity in H-2A oversight

Creative Commons Zero v1.0 Universal

2 stars 2 forks source link

Modeling notes #22

Closed rebeccajohnson88 closed 3 years ago

rebeccajohnson88 commented 3 years ago

I think you updated but we don't want to assume that all booleans are outcomes to predict. Instead, I'd just start a list of outcome vars like:

outcomes = ['outcome_XX', 'outcome_YY']

df_y = pre_df.select_dtypes(bool)
print("Outcome variables to predict are:" + str(df_y.columns.values))
y1 = list(df_y.iloc[:, 0])
y2 = list(df_y.iloc[:, 1])
# remove the them from the preMatrix ... because that would be too easy!
pre_df = pre_df.select_dtypes(exclude=['bool'])

For predictors, I don't think the capital letter heuristic will work once we add in more data so I'd: (1) read in the csv file with the merged job disclosure data, (2) use the list of columns for that as the initial features to consider using