Closed dwillis closed 3 years ago
Thanks as always for y'all's hard work here! A few initial issues I've experienced when using these precinct-level CSVs:
Valencia
to Worth
(space) character` (space) character after
precinct`Perry County's New Buffalo
is missing from the CSV, probably failed to parse from the PDF because its name is right before a page break:
Here's my last set of notes for today. Thank you again for the hard work!
President
rows for the CENTER
precinct, although these appear in the PDF source.President
rows for Newberry Township 1
, perhaps because that row is at a page break in the PDF:
President
rows for the 32.01 - Strattanville Borough
precinct, perhaps because of the PDF's page break as well:
@mileswwatkins many thanks for these - we'll get on them.
Oh, and Erie County's 40001 - WAYNE TOWNSHIP
has a trailing space in its precinct name
Blair County's CSV failed to extract the full/distinct precinct names from the ElectionWare PDFs.
Eg, Altoona Ward 2, Precinct 2
in the PDF appears as only Altoona Ward 2
in the CSV, and similar with Blair Township, District 3
becoming Blair Township
, etc. (The county has lots and lots of these numbered precincts, FWIW.)
@mileswwatkins Ok, I believe all of these issues have been resolved.
Thank you so much, @dwillis! Just finished another round of QA, including comparing county vote totals against AP/Edison, and everything looks great.
The largest discrepancy is that Beaver County is a couple thousand votes short (y'all have an older unofficial-results PDF instead of the final/official results currently on their site), but doesn't affect my use case :)
@mileswwatkins awesome, thanks for letting me know. We've updated Beaver.
Using Tabula, OCR or whatever method you can, parse precinct-level results for the following counties. Original sources are in the sources-pa repository.
The goal is to create a single CSV file for each county, with the following headers:
county
,precinct
,office
,district
,party
,candidate
,votes
If the county file also provides a breakdown of votes by method, include that using the following headers:
early_voting
,election_day
,provisional
,absentee
Include the following offices:
The CSV files should be named
20201103__pa__general__{county}__precinct.csv
. Here's an example finished file: https://github.com/openelections/openelections-data-pa/blob/master/2020/20200602__pa__primary__elk__precinct.csv.