Removed files whose origin I can find, especially in intermediate files. Want to make sure we only have datasets in place we understand.
Split our code files into three folders: 10_prep_data, 20_merge_data, and 30_analyze_data. The first is for importing and cleaning data from individual sources. The second is for connecting them (including calculating distances), the third is for analysis. Will discuss more in meeting tomorrow, but decomposition is, in my view, the secret to sanity.
Changed polling place source folder names. Folder and file names should fully communicate the nature of the data -- the old names didn't tell us what was CPI and what was safegraph, for example.
@jgy4 worked on re-writing 105_NearestPolling_01.py to be 20_calc_dist_to_nearest_pp_elecday.py. But running into some issues with the N exploding when I do the sjoin_nearest I don't quite understand yet... Need to wrestle with that, but also may be related to our issues with the first big merge that @PranavM98 and @dapoade need to address. Got it! Sometimes multiple polling places are same distance, so rows get duplicated. Fixed now (at least mostly).
Hey All,
OK, did some more work in the repo. Namely:
@jgy4 worked on re-writingGot it! Sometimes multiple polling places are same distance, so rows get duplicated. Fixed now (at least mostly).105_NearestPolling_01.py
to be20_calc_dist_to_nearest_pp_elecday.py
. But running into some issues with the N exploding when I do thesjoin_nearest
I don't quite understand yet... Need to wrestle with that, but also may be related to our issues with the first big merge that @PranavM98 and @dapoade need to address.