kuriwaki / cvr_harvard-mit_scripts

6 stars 1 forks source link

Review classification of discrepancy #319

Closed kuriwaki closed 2 months ago

kuriwaki commented 2 months ago

I am reviewing my code to classify the discrepancy in counties. This has been trickier than it first seemed, because of missing values. The fix below leads to reshuffling of classifications in about 20 counties, including some counties we had been calling "release". It seems better for a careful review by someone else.

Maybe @jloffredo2, if you have time in the next few days?

jloffredo2 commented 2 months ago

Moving my comments from email thread with @kuriwaki to here. After going through the code in the PR, I tracked down 7 counties whose classification changed. Essentially, the county in Colorado is an issue that was already flagged. Rest is the fact that we’ve been able to pick up write-in candidates that have D/R labels. They only received a small number of votes, so not putting our total numbers off that much. Bottom line, new code correctly labels these counties as “candidate missing” rather than “<1% mismatch”.

I ultimately think these results should be released. While they are currently labeled "candidate missing", my assumption is that their votes are recorded as part of the WRITEIN total. In the case of STEVE ZORN, the issue with that candidate changed from what was reported in https://github.com/kuriwaki/cvr_harvard-mit_scripts/issues/196 -- they were originally labeled differently but now do not show up at all in Adams County (at least in the version of compare.xlsx I am using to evaluate this PR).

@mreece13 -- do you have thoughts on dealing with these qualified write-in candidates that are causing these 7 counties to be labeled as "candidate missing" and thus won't be released, even though it's just a small number of votes received.

kuriwaki commented 2 months ago

Do any of you have a sample ballot where a "Qualfied write-in" exists? Their names are not printed on the ballot (much less their "party"), is that right?

If they are not printed on the ballot with D/R, I would say the dataset should not give them a party. Even give them a party = WRITE-IN which was discussed elsewhere (#206). In our paper for example, I isolate contest contests by the presence of a party = D or R candidate. So if a race is contested by a real Dem and a "W-I (R)" candidate, giving the W-I (R) candidate a party would lead to classifying the race as contested when I would prefer to call it uncontested.

jloffredo2 commented 2 months ago

I know right away that Adams County had a sample ballot: https://assets01.aws.connect.clarityelections.com/Assets/Connect/RootPublish/adams-co.connect.clarityelections.com/Election%20Results/2020/2020%20General/Sample-Ballot_Adams-County_2020-General.pdf. We can see that Steve Zorn is not there - it's only the results that list them with a party designation. So agreed, it is a little funny to list them with a party in our dataset then.

kuriwaki commented 2 months ago

Great; very helpful sample ballot. I think the code in question in this PR is working, so I have migrated the discussion to #324, and we can continue the discussion on what to do there. And I'll merge this branch. For now, the code is classifying these cases as "do not release".

kuriwaki commented 2 months ago

I was also wondering why the 14 counties moved to unclassified (instead of "candidate" missing) under this code. I found that they were counties where there was a R/D/L candidate in our cvr data, but no such data was in the MEDSL Baltz et al. returns. I modified the logic to make this explicit. None of these seem to be counties we were going to release anyways, but something to look into.

> read_excel("combined/compare.xlsx", sheet = 2) |> filter(color2_c == "unclassified") |> select(state, county_name, color2_c)
# A tibble: 14 × 3                                                                                                               
   state                county_name color2_c    
   <chr>                <chr>       <chr>       
 1 DISTRICT OF COLUMBIA STATEWIDE   unclassified
 2 FLORIDA              DUVAL       unclassified
 3 GEORGIA              DOUGLAS     unclassified
 4 IDAHO                BONNER      unclassified
 5 NEVADA               LINCOLN     unclassified
 6 NEVADA               NYE         unclassified
 7 NEW JERSEY           CAMDEN      unclassified
 8 NEW JERSEY           ESSEX       unclassified
 9 NEW JERSEY           HUDSON      unclassified
10 OHIO                 HANCOCK     unclassified
11 TENNESSEE            LOUDON      unclassified
12 TENNESSEE            PICKETT     unclassified
13 TENNESSEE            SEVIER      unclassified
14 TENNESSEE            WILLIAMSON  unclassified