Closed kuriwaki closed 2 weeks ago
Yes, I think we should switch to using dominionCVR
solely for the preprocessing of JSON files. I've made that switch and tested it on San Bernardino so I know it works. I just started running the pipeline with that new function. Unfortunately it'll take some time to complete but it should slowly make its way and be ready for release.
Great - if you can manage to activate the multicore setting when running dominionCVR::extract_cvr
, that should speed things up a lot. It responds to https://github.com/DavisVaughan/furrr
Yeah I have it running with furrr
, which I think probably needs to be added as a soft-requirement for dominionCVR
. Very quick, just waiting on the full pipeline to complete.
In this script https://gist.github.com/kuriwaki/20ee1774039f86242c285b466f50da7f, I have used Marin county to show how we should deal with duplicated votes in JSON files. This requires some careful thinking in what we drop and do not drop, but I managed to get the number of valid votes + undervoters + overvotes to line up exactly*.
*: Marin does not report the number of undervotes or number of overvotes, but it does report the number of candidates and the total ballots cast. I verified that my dedpued version matches that exactly
@mreece13 should we use dominionCVR and something like the script above for all of MIT's json files?