openelections / openelections-data-wi

Pre-processed election results for Wisconsin elections
9 stars 9 forks source link

District sometimes lost in pre-2010 files with truncated columns #45

Closed nbdavies closed 5 years ago

nbdavies commented 6 years ago

This test is currently failing:

    | party | candidate                     | county    | office                                    | district  | ward                                          | votes | total |
    | LIB   | Scattering                    | Marquette | State Senate                              | 14        | TOWN OF BUFFALO Wards 1 & 2                   | 1     | 1     |

Because the column is in fact blank in the CSV output:

# 20080909__wi__primary__ward.csv
Marquette,Town Of Buffalo Wards 1 & 2,State Senate,,1,LIB,Scattering,1

And that's because the table in the source file is one in the middle of a pre-2010 file with some of the columns cut off: image We would normally extract the district from the office name column, but it isn't present here. We're copying over the office name from the previous chunk of the file, which is correct. But carrying over the district number would be incorrect: the previous section of the file is for district 12, and the next section is for district 16.

davipo commented 6 years ago

I revised parser.py to print a note when columns are missing in these 2000-2010 files. This occurs in 3 data sections in fall primary election files (once in 2006, twice in 2008).

I've added a kludge to set the district to 14 when the office column is missing, as that occurs only in one section of Libertarian_2008_FallElection_StateSenator_WardbyWard.xls (id 431).

These changes are in my branch dp-add-district-to-tests.