Closed nk9 closed 7 years ago
That's a super good idea. We have, further down our processing pipeline, the concept of validations, but it would be better I think to have simple checks like this at this stage. The order of the columns doesn't necessarily matter too much, but # 2, 3, 5, 6, 7, 8 and 9 are solid checks. In Mississippi, we have been using 'X' to represent precincts that are not a part of specific legislative districts.
OK, great. I'll take a stab at it. What are your thoughts on precincts? There are a bunch of formats currently:
0001 0001 1 Full Name 0001 1 PREC #1 PREC 1 1 Full Name FULL NAME 1 Full Name 1 FULL NAME 1 - Full Name
Should we be enforcing any kind of consistency here?
In Mississippi, we have been using 'X' to represent precincts that are not a part of specific legislative districts.
That's in the precinct field, though, right? Am I still right to say that this should have "0" for the votes?
{'votes': 'X', 'district': '27', 'candidate': 'PHILIBEN, ANNE N', 'office': 'State Senate', 'county': 'Wasco', 'party': 'DEM', 'precinct': 'PREC 19'}
No, in Mississippi we do have "X" in the votes column for precincts that aren't part of the legislative district. Not in the precinct column. https://github.com/openelections/openelections-data-ms/blob/master/2015/20151103__ms__general__clay__precinct.csv#L368
Ran the verifier script (thanks!) and yeah, the pseudo-candidate names are definitely an issue in this repo (and likely others). We have a couple of options:
I'd kinda prefer 1.
Considering that changing them in bulk is as easy as making a sed/perl one-liner, let's just do that.
I see the "X for excluded precincts" being used in the Wasco 2002 primary as well. Can't we just remove these lines entirely? The only reason they are there is to pad out the precinct × candidate grid. Other counties, which provide data by precinct, simply don't list contests which didn't appear on the ballot in a given precinct. Or they list only the precincts relevant to a given contest. Like this from Lane County's 2008 primary:
DEM State Representative 7th District
G N M W
. i c R
c K D N I
k i o o T O V U V
b n r E V O N O
b a d - E T D T
i l i I R E E E
n d n N S R S
----- ----- ----- ----- -----
0005 100007 Blue River 169 201 2 0 195
0007 100009 Camas 38 48 1 0 72
0017 100097 Latham 142 267 1 0 214
0019 100102 Lowell 140 149 6 1 133
0022 100107 Mosby 227 421 9 0 361
0024 100112 Pleasant Hill 2 59 60 1 0 80
0026 100119 Salmon Creek 82 76 3 0 78
0035 100300 Cottage Grove 353 590 1 0 371
0041 100800 Oakridge 145 226 3 0 145
TOTALS 1355 2038 27 1 1649
- Party included for "Write-in," etc lines in primaries, not for generals/specials
And leave it to Wasco to throw us another curve ball: their abstract xlsx seems to merge the Dem and Rep under/over votes in the 2000 primary. Fortunately not in any of the other elections! Don't think I'm going to accomodate this quirk in the verifier.
@dwillis I have a question about filename formatting. I see these:
20120131__or__special__general__clatsop__house__1__precinct.csv
20111108__or__special__primary__clatsop__house__1__precinct.csv
Which implies that special is just a qualifier for the standard primary/general dichotomy. But then there's also this:
20120131__or__special__washington__house__1__precinct.csv
Where a special is assumed to be a general. Which approach is correct? Or can it really be either?
Also, it would be helpful to get an answer to the "X for district" follow-up I asked on 29 Nov.
@nk9: yeah, that 2012 Washington special should be renamed to include general. Good catch. In terms of the X for district, you can ditch those lines.
We're generating lots of CSV files, but there are some common inconsistencies/mistakes. I'm thinking it would be useful to have a script people could run on newly completed CSV files to make sure it's following the OE format.
Things I thought could be verified:
Has something like this already been created? Other ideas for sanity checks which could be run?