Closed jgy4 closed 2 years ago
GREAT! Comments in the morning.
As for report outline, I like it a lot.
As for centroid times, I wouldn't fixate on those. It's one of the things where we know we're gonna update our data, and said there's no reason to invest time into running all that code until we made the changes we know we're going to make to the input data.
I think the MOST important thing is pinning down the polling data we want to use. I was thinking about this last night, and I think we should follow up with Vaughan on just getting on the phone with the ballot ready data person to ask her take on which data to trust. my HOPE is we can just use theirs, but I think hearing it from the horses mouth is going to be the best strategy.
Basically I don't trust safegraph much -- they're in the business of generating really good estimates of things like foot traffic, but small errors aren't really a problem for that type of analysis in the way missing a couple polling places can be a problem for this type of analysis. Maybe they did do a good job, but we aren't sure, and our validation analyses haven't gotten us an answer yet,
I DO trust CPI, but I also know that they only have one person working on this and they don't have all 50 states.
Ballot Ready and VIP or organizations that are basically dedicated to collating and providing this data to people (e.g. I think VIP runs the google and Facebook "where's your polling place" Election Day tools). Adriane and I have tried to approach them in the past to get their data and they've always said no so getting it is really exciting and we may just be able to use their data and not worry about anything else. 
While we wait on that, I'd prioritize:
That make sense?
Also re: urban rural differences:
As you do analysis, always write your code bearing in mind that we're gonna keep changing the data going into the analyses, so never "just analyze" the data -- write code to generate tables so we can just update the data being read in, run it, and get updated results!
Regarding the Missing Values in the 2020 Early Distances: Interestingly, the google api was able to compute the distances between the college and polling places for the values that were missed. However, the distances by walk, car, and transit are different (slightly). Wondering if we can impute the missing distances with the distances calculated by the google api?
The google distance api also has 56 missing values and here are the list missing value (state):
The good part is that except for NJ, the other states are in "Other" region. We do not require these states/territories for our analysis.
I'd been trying to do some digging with Pranav about the null values for early voting distances but you can see Pranav was able to address most of it using the Google API. @nickeubank I know you prefer GeoPandas, so do you think it'd be worth continuing that analyses / do you have any additional documentation for sjoin_nearest
The walking distances are distance to the nearest polling place, where "nearest" was determined using sjoin_nearest
, right? sjoin_nearest
is identifying the nearest polling place; google api is just getting us a distance by travel modality rather than straight-line distance, right?
Correct. But isn't the travel modality more digestible / of greater interest because ultimately we are concerned with level of access, and straight line distance might not be as representative.
If we believed that it was really precise, then yes. My concern is that its giving an illusion of precision that is unrealistic -- measuring distances from the centroid of a campus / edge of a polygon is an inherently imprecise endeavor meant to estimate an approximate distance for the average student; pretending we're measuring walking times to the minute seems... a little contrived? Don't get me wrong, we should include them, but I think straight distance is a little more transparent, and crucially it's also a lot easier to deal with computationally, which makes life easier as we iterate on our data.
Put differently: I'm not sure that the difference between straight-line-distance and travel-modality distance is greater than the inherent uncertainty of what we're measuring.
(I also like "distance from polygon edge" as a metric more than from centroid -- hard to argue election administration officials should be doing more than putting a PP on every campus, which gets a "0" in distance from polygon, but non-zeros with a walking distance metric)
Hi @nickeubank @adrianefresh
We had a discussion after our meeting with you all and Vaughan to sort through priorities, tasks, and our final report. We'd love your feedback on these!
Tasks From Today (i.e. requests from Vaughan)
Tasks for Nov. 30th Meeting & Dec. 3rd Final Report
Proposed Final Report Outline