openelections / openelections-data-ks

Pre-processed results for Kansas elections
5 stars 12 forks source link

Candidate totals—keep? #39

Open nk9 opened 7 years ago

nk9 commented 7 years ago

A few weeks ago in 3adcc35, @karpet removed the TOTAL lines in a few files. I have been using them to do vote checksums (see total-checksum.py in OR). Especially in cases where the data is coming from hand keying, I've found lots of errors with the checksums in the past. But even without that, the checksum can still flag up duplicate or garbage lines, candidate name variations, and a whole bunch of other issues.

I realize I'm new to this state, so I didn't want to step on any toes. But I'm wondering if there's a particular reason to exclude them here?

dwillis commented 7 years ago

Yeah, this is something I should have taken notice of earlier. We don't really have a policy on whether to include totals or not, but in general I think we do include them if they are available. But I can see @nk9's point about their utility in catching and correcting errors, which is critical. Another option would be to use county-level file totals. Any thoughts, @karpet?

karpet commented 7 years ago

My assumption was that the Totals rows were coming from the sources files automatically, somehow, and were introduced via parsing of a PDF, e.g. Obviously that was a mistaken assumption.

In general I like that a CSV of results doesn't mix flavors of rows, unless we somehow document and are consistent about the conventions we follow. My personal preference would be store metadata about a CSV in a separate file, or in some explicitly flagged way, so as not to overload the format to store things, and things about the things, in the same way. But if there's already a convention here at play, then I defer to the community, esp since I am so late to the game.

karpet commented 3 years ago

fwiw I preserved the totals coming from the 2018 and 2020 general results. Those are part of SOS source records.