openelections / openelections-data-wi

Pre-processed election results for Wisconsin elections
9 stars 9 forks source link

Primary election data duplicated in 2012-11-06 general election file #31

Closed davipo closed 6 years ago

davipo commented 7 years ago

id 409 general election file Ward%20by%20Ward_11.6.12.xls includes results for the District 33 special primary election that occurred on the same date.

This data is duplicated in the id 410 special election file: Ward%20Results-11.6.12-Sen%2033%20Spec%20Pri.xls

Results file 20121106wigeneralward.csv contains all of the data in 20121106wispecialprimary__ward.csv

Should the duplicated data be removed from 20121106wigeneral__ward.csv ? We might print a warning when detecting primary data in a general election file. (Primary election office names end in a party name.)

nbdavies commented 7 years ago

Good catch. My reasoning would be...

The two elections can't be conflated (an election can either be general or primary), so as far as the elections API, they need to be separate records.

For election 410, we have a source file that just has that one election included, so that one is okay. But using a source file for election 409 that includes results for election 410 is problematic.

Long story short, the Elections Commission website doesn't have another file available that would replace what we're using now without including the district 33 primary results. Maybe we can add a safeguard that would prevent us from parsing those tabs of the spreadsheet (Sheet198-Sheet201) for the general election results.

Perhaps if we're parsing a general election and the race/office name in the header includes the party, we skip that sheet. We could initially make it throw an error or a message, to make sure it only applies when appropriate, and to see if there are other instances like this we haven't noticed.

davipo commented 6 years ago

As recommended, I revised parse_sheet() to skip the sheet if a general election office header includes a party. It prints a warning with sheet name, election date & id, office, district, and party.

20121106wigeneral__ward.csv, id 409 is the only file affected. Primary results (for 4 parties) for State Senate district 33 no longer appear in this file.

This avoids editing the input file Ward%20by%20Ward%20Report_1.xlsx manually.