Recover undervotes in JSON files

kuriwaki commented 3 weeks ago

In this script https://gist.github.com/kuriwaki/20ee1774039f86242c285b466f50da7f, I have used Marin county to show how we should deal with duplicated votes in JSON files. This requires some careful thinking in what we drop and do not drop, but I managed to get the number of valid votes + undervoters + overvotes to line up exactly*.

*: Marin does not report the number of undervotes or number of overvotes, but it does report the number of candidates and the total ballots cast. I verified that my dedpued version matches that exactly

@mreece13 should we use dominionCVR and something like the script above for all of MIT's json files?

mreece13 commented 3 weeks ago

Yes, I think we should switch to using dominionCVR solely for the preprocessing of JSON files. I've made that switch and tested it on San Bernardino so I know it works. I just started running the pipeline with that new function. Unfortunately it'll take some time to complete but it should slowly make its way and be ready for release.

kuriwaki commented 3 weeks ago

Great - if you can manage to activate the multicore setting when running dominionCVR::extract_cvr, that should speed things up a lot. It responds to https://github.com/DavisVaughan/furrr

mreece13 commented 3 weeks ago

Yeah I have it running with furrr, which I think probably needs to be added as a soft-requirement for dominionCVR. Very quick, just waiting on the full pipeline to complete.

kuriwaki / cvr_harvard-mit_scripts

Recover undervotes in JSON files #339