openelections / openelections-data-wi

Pre-processed election results for Wisconsin elections
9 stars 9 forks source link

Wisconsin 2014 general elections data missing an Assembly district? #10

Closed epaulson closed 7 years ago

epaulson commented 8 years ago

It looks like Assembly district 99 got dropped in the data as checked in:

In [1]: import pandas as pd

In [2]: df = pd.read_csv("20141104__wi__general_ward.csv")

In [3]: df.columns
Out[3]: 
Index(['county', 'ward', 'office', 'district', 'total votes', 'party',
       'candidate', 'votes'],
      dtype='object')

In [4]: df.loc[df['office'] == 'Assembly']['district'].unique()
Out[4]: 
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,
        23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,
        34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,
        45.,  46.,  47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,  55.,
        56.,  57.,  58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,  66.,
        67.,  68.,  69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,
        78.,  79.,  80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,
        89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.])

In [5]: len(df.loc[df['office'] == 'Assembly']['district'].unique())
Out[5]: 98

Grabbing the API metadata from http://openelections.net/api/v1/election/?format=json&limit=0&state__postal=WI and checking the .xlsx the GAB supplies I do see AD 99:


"direct_links": [
"http://www.gab.wi.gov/sites/default/files/11.4.2014%20Election%20Results%20-%20all%20offices%20w%20x%20w%20report.xlsx"
],
"end_date": "2014-11-04",
"gov": true,
"house": true,
"id": 1574,

Not sure what other elections might be missing data in case this is an off-by-one error in the parser somewhere...

epaulson commented 8 years ago

This does seem to be a problem with the processed csv data as checked into github - when I clone the repo and run parse.py - which uses the Excel files that are in the local_data_cache - I get different results from what are checked in to the yearly directory data:

(openelex27)epaulson:~/development/openelex/openelections-data-wi $ git status
# On branch 10-missing-election
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   2010/20100914__wi__primary_ward.csv
#   modified:   2010/20101102__wi__general_ward.csv
#   modified:   2011/20110503__wi__special_general_ward.csv
#   modified:   2012/20120403__wi__primary_ward.csv
#   modified:   2012/20120605__wi__general-recall_ward.csv
#   modified:   2014/20140812__wi__primary_ward.csv
#   modified:   2014/20141104__wi__general_ward.csv
#   modified:   2015/20150217__wi__special_primary_ward.csv

Not sure how you want to keep those up to date with what parser.py produces.

dwillis commented 8 years ago

Thanks for this catch.

nbdavies commented 8 years ago

@davipo has some changes in his fork that fix this. (There were some spreadsheets where the last sheet wasn't being read.)

nbdavies commented 7 years ago

@epaulson Can we close this one?

epaulson commented 7 years ago

Yeah, I think so - I just spot-checked the same CSV from a fresh checkout and it had 99 races instead of 98, so I think it's better now.