kuriwaki / cvr_harvard-mit_scripts

6 stars 1 forks source link

[CA] Merced #320

Closed aconevska closed 2 months ago

aconevska commented 2 months ago

Recommendation: Use Harvard for now but check if MEDSL is removing or missing Precinct 515, or "515 Los Banos #6 (Vote counts almost exactly match MEDSL's missing votes)

Harvard has exactly the same numbers as the official Merced County count. For president, so does MEDSL. In the US House MEDSL is off by just 420 votes, and for the State House, 418. When I quickly clean the raw "cvr.csv" file for Merced just to remove NAs and under/over ovtes - i.e. minimal processing - I get the exact same as the official totals, which indicates its likely something benign. (This is a not a fragmented county or one that requires multiple steps to clean.)

One straightforward explanation would be that MEDSL is missing the Precinct "515 Los Banos #6". This Precinct has a total of 423 votes for the US House, just 3 more than MEDSL has. And for the State House it has exactly the same number of votes as what MEDSL is short of 418.

mreece13 commented 2 months ago

Merced, CA is definitely a fragmented county so I'm a bit confused how the Harvard team has cleaned it. Regardless, I've fixed this issue for Merced, I was attempting to be conservative with the pagination script and it was not worth it. This may also resolve some other fragmented counties that MEDSL was slightly miscounting the totals for. Pending a further build.

aconevska commented 2 months ago

Oh interesting, I didn't realise it was fragmented actually. Though fragmentation generally doesn't matter for pres vote count and often not for pres + US House, in my experience for CA. (i.e. the fragmentation begins further down the ballot, splitting records usually after the US House.)

Harvard's (Jim's) fragmentation code is pretty basic but I'm happy to send you a written description of the process (I wrote it all down ages ago anyway). I think Merced was fairly easy to identify split records because its relatively small.

mreece13 commented 2 months ago

Yeah, it would be interesting to know in writing what the process y'all are taking is. Merced has the helpful feature that they label page 2 in the Ballot Style column. The MEDSL process is described in the paper draft, or you can find it in the parse_pagination.py script under the build-medsl directory.

mreece13 commented 2 months ago

Closing this for now, MEDSL now matches Harvard's totals.