Closed dwillis closed 1 year ago
I believe I've got some code that parses the 1438 page Ottowa results. it does now I need to 1) throw out the local race pages; and 2 ) aggregate the pdf data and then 3) merge into CSVs
running some errands will write up the process as I was playing with pyPDF2 and text extraction.
@moonshiner awesome, thanks! if it's easier to keep the local race results in there, go ahead and do that.
The local races are 2/3 of the pages, so I could ignore those as I was running various tests.
I've been putting the pieces in https://github.com/moonshiner/openelections-data-ottowa-mi but documenting the various stages. That's more for me to share with my folks at work example wise.
I parsed and hand tweaked csv examples of the 4 pages of one precinct. Let me know what you think. They are standard tweaks so it can be scripted tweaks.
L:
Like I have the code locally on handling precincts, etc. Just getting it straight in my head (why can't I spell?)
@moonshiner sounds good to me - let me know when you have a CSV we can check out. Might be useful for other counties, too!
Okay I've pushed I believe a cleaned up version at https://github.com/moonshiner/openelections-data-ottowa-mi
and the procesed out (per page)
I found it easier in working through the code to use this process
Split a PDF into individual pages
Extract the test of each page (retaining the x/y on the page)
Extract results from each page
The README explains a bit of it. I ran it over the Genesse county results and appeared to behave
Have several things that need clean up and perhaps we should talk extra tallies
Once I give you data that pleasaes you i can combine the code (and comment it a bit more)
@moonshiner this looks really great, thank you! If it's easy to do, would change the following values:
Undervotes: -> Under Votes Overvotes: -> Over Votes Cast Votes: -> Ballots Cast
And remove other colons.
Oh yes. Wanred to chat you up first. Its not the same as your example. Also this made another onr workLooking ag genessee which is landscape Im hoping the code will be useful down the road. Someone on my team is going to review to confirm others can make senseThe pdfs which cant be parsed intrigue me. From elkins TimSent from my iPhoneOn Jan 12, 2023, at 16:41, Derek Willis @.***> wrote: @moonshiner this looks really great, thank you! If it's easy to do, would change the following values: Undervotes: -> Under Votes Overvotes: -> Over Votes Cast Votes: -> Ballots Cast And remove other colons.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
https://github.com/moonshiner/openelections-data-ottowa-mi
This should have tallies for Ottawa and Muskegon completed in the ParsedData/
@moonshiner thanks!
Two more to wrap things up:
Keweenaw-MI-totals.csv Iosoco-MI-Totals.csv
Hope that helps!
did some useful procrastinating tonight. hope these work
https://github.com/moonshiner/openelections-data-ottowa-mi/blob/main/Iron-Results.csv
https://github.com/moonshiner/openelections-data-ottowa-mi/blob/main/Makinac-Results.csv
@moonshiner thanks! those look really good, just a few minor things I quickly fixed.
I aim for perfect but sadly fall short.
Also, after dumping the Livingston PDF data, I realized I can modify my code I used on Ottawa to pull out the LIvingston County data. And as I have a few meetings today....
If/when I'll share the code etc (it may be even more hackier but I'll document the process)
@moonshiner we all fall short of perfection. thanks for your contribution to the project!
https://github.com/moonshiner/openelections-data-ottowa-mi/tree/main/Livingston-MI
Here are the House/StateHouse/StateSenate races for Livingston
I want ot double check them but also want to wrap the others up.
I need to do the Straight Party and SoS. Let me wrap those up and do some data validation and will let you know.
Those look much better. when you see "Election" in the csv files those are election day votes (vs Absentee) I put the python in there but it was a bit less robust, but that was because the pages came out very identical.
Using Tabula, OCR or whatever method you can, parse precinct-level results for the following counties. Original sources are in individual county files in the sources-mi repository.
The goal is to create a single CSV file for each county, with at the least the following headers:
county
,precinct
,office
,district
,party
,candidate
,votes
If a county provides a breakdown of voting method (ie, election day vs. absentee), please include those as separate columns.
Here's an example of a finished CSV file. The files should go in the 2022 folder in this repository and should have the following filename structure:
20221108__mi__general__{county}__precinct.csv
, wherecounty
is the lower case version of the county name, with punctuation removed.If the county file also provides a breakdown of votes by method, include that using the following headers, where applicable:
early_voting
,election_day
,provisional
,mail
If there are other possible vote types, include them, using a lowercase version of the vote type with underscores instead of spaces for the column name.
Include the following offices:
If a county provides precinct results for Write-in candidates, they should be grouped in a single row for each precinct and office with a
candidate
value ofWrite-ins
.If a county provides Under Votes or Over Votes, those should be recorded in the same way, with a single row per precinct and office with
Over Votes
andUnder Votes
as thecandidate
values.