Open kuriwaki opened 6 months ago
Hmm, this is going to be exceptionally hard to address. It does not appear consistently even within Wisconsin, I checked out all of the counties and it does not even seem to be occurring in every county where there are NAs (ie, some of them are undervotes). Perhaps we can proceed on a case by case basis and write some code that can at least detect when this occurs in the President race. I can also think of a potential solution for the other contests, but it will likely not be a flawless system.
I am also a bit suspicious that the NAs don't exactly add up to the total reported write-in votes, but perhaps by re-adding the missing border precincts we would get to the correct number.
Agree all around. I might rank that ("case by case basis and write some code that can at least detect when this occurs in the President race") somewhat highly, because I do think some users of the data are going to be interested in analyzing third party voters with this data (given the attention to RFK in 2024, as in the Lewis and Herron CVR article, "Did Ralph Nader Spoil Al Gore's Presidency")
By the way, I think one way this kind of NA gets produced in the data is when the actual CVR is an Excel with jpeg images for their write-ins. Here is an example from Bay, FL. This is what the Excel file looks like (note the "Mark Rogers" write-in for Congress):
and in the csv version, the cell is blank where the jpeg images are.
With a few spot checks, I think this is specific to ES&S DS200 machines. Noted in the paper.
Dane county's (WI) raw CVR has the following values of President:
This is what both Jim and Mason's final database gives, after it has been cleaned. Notice there is no red NA; all other values are the same
The 1137 "NAs" got dropped here, as if the office was not available on that ballot. However, that is implausible given this is US President. In fact the official county certification reports 1,146 write-in votes, of which 808 were "SCATTERING" and about 216 were for Hawkins, the Green party candidate who failed to get on the ballot.
So, it seems like in some cases, these empty cell values should be "WRITE-IN" and there should be some vote entry in the long data. However, we don't have a great method to determine if it's that or the contest was not on the ballot (e.g. a split/fragmented paginated ballot).