rebeccajohnson88 / qss20_s21_proj

Repo for DOL Summer Data Challenge on equity in H-2A oversight
Creative Commons Zero v1.0 Universal
2 stars 2 forks source link

Startup notes #2

Closed rebeccajohnson88 closed 3 years ago

rebeccajohnson88 commented 3 years ago

Geolocation data:

Main challenge: getting tract identifiers onto the violations data if they don't already exist

Ways to make scope more manageable: possibly focusing on states within TRLA catchment area ('TX', 'MS', 'LA', 'KY', 'AL', 'TN') to have fewer things to geocode and fewer sets of tracts to pull; could also increase geographic unit of analysis to zip code or county, but less interesting; if time, could add housing locations data (in this repo)

DOL staffing:

Main challenge: getting the OPM workforce data into a usable format/investigating granularity of staffing data

Text as data:

Main challenge: data is already acquired so data acquisition straightforward; main goal is using the problem set two text as data content to find interesting patterns/expanding beyond that if relevant and also merging of jobs/violations data

ML/stats:

Main challenge: since not explicitly covered in course content, ML code in sklearn or other packages