kuriwaki / cvr_harvard-mit_scripts

6 stars 1 forks source link

Standardizing party designation for write-ins #324

Closed kuriwaki closed 2 months ago

kuriwaki commented 2 months ago

Making an issue so we can resolve it before release. There are a few counties where this seems to be causing unnecessary red flags.

Qualified writeins seem to be writeins whose name and party are not printed on the ballot, but are sometimes recorded as having a party when a voter writes in their name. (see https://github.com/kuriwaki/cvr_harvard-mit_scripts/pull/319#issuecomment-2206828384)

I see three options for these

  1. Keep "qualified writeins"'s party party_detailed as D/R
  2. Make "qualified writeins"' party_detailed == "OTHER", just like a generic write-in (candidate == "WRITEIN")
  3. Change ALL writein candidate to party_detailed == NA (EDIT 7/6: or party_detailed == "WRITEIN"). Currently almost all unqualified write-ins in medsl (Reece) have party_detailed == "OTHER"
  4. Make a new binary 1/0 column called writein like the Baltz et al. MEDSL format

I lean towards option 3, and if we have time, 4. The Baltz et al. dataset seems to do only 4, while retaining the party of the qualified write-in. For example Steve Zorn of CO-07 noted in #196 is listed the following way in the @sbaltzmit precinct sql file. This is rational, but I'm not sure if we can ensure this for all our cvrs at this point.

> ret_all |> filter(candidate == "STEVE ZORN") |> count(candidate, writein, party_detailed)
# A tibble: 1 × 4
  candidate  writein party_detailed     n
  <chr>      <lgl>   <chr>          <int>
1 STEVE ZORN TRUE    DEMOCRAT         396

Relevant issues:

kuriwaki commented 2 months ago

Once we decide what to do with this, my recommendation is to then change Howie Hawkins' party designation to a write-in (strip the Green designation) in

this is according to my assessment of #298

mreece13 commented 2 months ago

I think the correction to the classification script was the correct choice, we cannot add anything more to the CVR data that is not there. I can change Howie Hawkins to a WRITEIN party in those jurisdictions, I think it makes sense as well. Pending build.

kuriwaki commented 2 months ago

I think we want to do more changes than Hawkins. currently the candidate == "WRITEIN" records are listed as party_detailed == OTHER in CVR_parquet/medsl. So I think we want to change them all to party_detailed == "WRITEIN", and change the designation of qualified write-ins like Steve Horn to party == WRITEIN (or, as Baltz et al. does, make a written column).

library(tidyverse)
library(arrow)

# current data -- should be party = WRITEIN
open_dataset("release") |> 
  filter(candidate == "STEVE ZORN") |> 
  count(state, candidate, party_detailed) |> 
  collect()
#> # A tibble: 2 × 4
#>   state    candidate  party_detailed     n
#>   <chr>    <chr>      <chr>          <int>
#> 1 COLORADO STEVE ZORN INDEPENDENT       17
#> 2 COLORADO STEVE ZORN DEMOCRAT          16

# what about other writeins?
open_dataset("release") |> 
  filter(candidate == "WRITEIN") |> 
  count(state, candidate, party_detailed) |> 
  collect()
#> # A tibble: 15 × 4
#>    state      candidate party_detailed     n
#>    <chr>      <chr>     <chr>          <int>
#>  1 ARIZONA    WRITEIN   OTHER          29145
#>  2 CALIFORNIA WRITEIN   OTHER           5615
#>  3 COLORADO   WRITEIN   OTHER           1275
#>  4 FLORIDA    WRITEIN   OTHER           1905
#>  5 FLORIDA    WRITEIN   NONPARTISAN      698
#>  6 GEORGIA    WRITEIN   OTHER          73150
#>  7 ILLINOIS   WRITEIN   OTHER            121
#>  8 MARYLAND   WRITEIN   OTHER           2474
#>  9 MICHIGAN   WRITEIN   OTHER            316
#> 10 NEW JERSEY WRITEIN   OTHER           3961
#> 11 OHIO       WRITEIN   OTHER           8322
#> 12 OREGON     WRITEIN   OTHER          14046
#> 13 TEXAS      WRITEIN   OTHER           1426
#> 14 WISCONSIN  WRITEIN   OTHER           8907
#> 15 IOWA       WRITEIN   OTHER            184

# need to fix Florida

# what about Baltz data
open_dataset("returns/by-county/") |> 
  filter(candidate == "WRITEIN") |> 
  count(candidate, party_detailed) |> 
  collect()
#> # A tibble: 1 × 3
#>   candidate party_detailed     n
#>   <chr>     <chr>          <int>
#> 1 WRITEIN   WRITEIN        20850

# what about Steve Zorn in Baltz data?
open_dataset("returns/by-county/") |> 
  filter(candidate == "STEVE ZORN") |> 
  count(candidate, writein, party_detailed) |> 
  collect()
#> # A tibble: 1 × 4
#>   candidate  writein party_detailed     n
#>   <chr>        <dbl> <chr>          <int>
#> 1 STEVE ZORN       1 DEMOCRAT           2

Created on 2024-07-05 with reprex v2.1.0

kuriwaki commented 2 months ago

One correction to my repress above. The non-qualified write-ins are actually listed as NA in the Baltz data (except in DC, and state leg offices for KS + SC, for some reason). I was just overwriting them WRITEIN.

kuriwaki commented 2 months ago

@mreece13 you can check out my commit in https://github.com/kuriwaki/cvr_harvard-mit_scripts/commit/b030c4d6279f0df315da27783c542d0cdb3bf572 that will overwrite the party of anyone who is candidate == "WRITEIN" (let's call them unqualified writeins) to party = WRITEIN.

mreece13 commented 2 months ago

That looks good to me. Most of the MEDSL data should also reflect this change now (I am currently syncing a new version of the data to Dropbox). I have only re-built the counties we were considering releasing so some of them are missing it still.

kuriwaki commented 2 months ago

Ok, so unqualified write-ins seems standardized now. Great! I'll cautiously close this momentarily with the pull request.

Qualified write-ins are hard to mark as party_detailed == "WRITEIN" because we cannot easily detect them, especially if we have accidentally given them parties. I guess that's something to note. By the way CVRs probably lose write-ins completely if they are a csv format from ESS DS200 (#123)

Remaining issues on this thread like Kenosha and Maryland seem more like things to change on the returns side, not car side (#328). I will transfer those issues there.

I also verified Howie Hawkins