Open nk9 opened 7 years ago
This code from the CA data project looks like it will do what I'm asking. It also does a few of the other checks (like verifying that every line in the file has the same county), but misses many of them. @charles-difazio
Hey, glad that looks useful!
Feel free to drop it in here for your purposes. I'd also suggest setting up Travis CI so that any data changes and updates will get validated automatically.
In the medium-term, we should probably pull the validation code into its own library. I've been meaning to generalize it, but haven't yet gotten around to it (trying to get CA 2016 to pass all the tests, first).
I think you were proved right that this is the right step of the process to verify some basic things about the data with a script. We've found all sorts of issues which would have been much harder to diagnose later on in the process.
One thing we still don't have, though, is a verification of the final vote tallies. I see that each election has a csv file, e.g.
20100518__or__primary.csv
. I guess this data comes from the state? Is there already a script adding up all the county results by candidate and verifying it against this? And if not, should there be? :-)One thing this might require, though, is normalization of candidate names.