Closed kuriwaki closed 2 months ago
Moving my comments from email thread with @kuriwaki to here. After going through the code in the PR, I tracked down 7 counties whose classification changed. Essentially, the county in Colorado is an issue that was already flagged. Rest is the fact that we’ve been able to pick up write-in candidates that have D/R labels. They only received a small number of votes, so not putting our total numbers off that much. Bottom line, new code correctly labels these counties as “candidate missing” rather than “<1% mismatch”.
I ultimately think these results should be released. While they are currently labeled "candidate missing", my assumption is that their votes are recorded as part of the WRITEIN
total. In the case of STEVE ZORN, the issue with that candidate changed from what was reported in https://github.com/kuriwaki/cvr_harvard-mit_scripts/issues/196 -- they were originally labeled differently but now do not show up at all in Adams County (at least in the version of compare.xlsx
I am using to evaluate this PR).
@mreece13 -- do you have thoughts on dealing with these qualified write-in candidates that are causing these 7 counties to be labeled as "candidate missing" and thus won't be released, even though it's just a small number of votes received.
Do any of you have a sample ballot where a "Qualfied write-in" exists? Their names are not printed on the ballot (much less their "party"), is that right?
If they are not printed on the ballot with D/R, I would say the dataset should not give them a party. Even give them a party = WRITE-IN
which was discussed elsewhere (#206). In our paper for example, I isolate contest contests by the presence of a party = D or R candidate. So if a race is contested by a real Dem and a "W-I (R)" candidate, giving the W-I (R) candidate a party would lead to classifying the race as contested when I would prefer to call it uncontested.
I know right away that Adams County had a sample ballot: https://assets01.aws.connect.clarityelections.com/Assets/Connect/RootPublish/adams-co.connect.clarityelections.com/Election%20Results/2020/2020%20General/Sample-Ballot_Adams-County_2020-General.pdf. We can see that Steve Zorn is not there - it's only the results that list them with a party designation. So agreed, it is a little funny to list them with a party in our dataset then.
Great; very helpful sample ballot. I think the code in question in this PR is working, so I have migrated the discussion to #324, and we can continue the discussion on what to do there. And I'll merge this branch. For now, the code is classifying these cases as "do not release".
I was also wondering why the 14 counties moved to unclassified
(instead of "candidate" missing) under this code. I found that they were counties where there was a R/D/L candidate in our cvr data, but no such data was in the MEDSL Baltz et al. returns. I modified the logic to make this explicit. None of these seem to be counties we were going to release anyways, but something to look into.
> read_excel("combined/compare.xlsx", sheet = 2) |> filter(color2_c == "unclassified") |> select(state, county_name, color2_c)
# A tibble: 14 × 3
state county_name color2_c
<chr> <chr> <chr>
1 DISTRICT OF COLUMBIA STATEWIDE unclassified
2 FLORIDA DUVAL unclassified
3 GEORGIA DOUGLAS unclassified
4 IDAHO BONNER unclassified
5 NEVADA LINCOLN unclassified
6 NEVADA NYE unclassified
7 NEW JERSEY CAMDEN unclassified
8 NEW JERSEY ESSEX unclassified
9 NEW JERSEY HUDSON unclassified
10 OHIO HANCOCK unclassified
11 TENNESSEE LOUDON unclassified
12 TENNESSEE PICKETT unclassified
13 TENNESSEE SEVIER unclassified
14 TENNESSEE WILLIAMSON unclassified
I am reviewing my code to classify the discrepancy in counties. This has been trickier than it first seemed, because of missing values. The fix below leads to reshuffling of classifications in about 20 counties, including some counties we had been calling "release". It seems better for a careful review by someone else.
Maybe @jloffredo2, if you have time in the next few days?