openelections / openelections-data-mi

Converted official precinct results for Michigan elections
4 stars 14 forks source link

2022 General Election Precinct Level Results #56

Closed dwillis closed 1 year ago

dwillis commented 1 year ago

Using Tabula, OCR or whatever method you can, parse precinct-level results for the following counties. Original sources are in individual county files in the sources-mi repository.

The goal is to create a single CSV file for each county, with at the least the following headers:

county, precinct, office, district, party, candidate, votes

If a county provides a breakdown of voting method (ie, election day vs. absentee), please include those as separate columns.

Here's an example of a finished CSV file. The files should go in the 2022 folder in this repository and should have the following filename structure:

20221108__mi__general__{county}__precinct.csv, where county is the lower case version of the county name, with punctuation removed.

If the county file also provides a breakdown of votes by method, include that using the following headers, where applicable:

early_voting, election_day, provisional, mail

If there are other possible vote types, include them, using a lowercase version of the vote type with underscores instead of spaces for the column name.

Include the following offices:

If a county provides precinct results for Write-in candidates, they should be grouped in a single row for each precinct and office with a candidate value of Write-ins.

If a county provides Under Votes or Over Votes, those should be recorded in the same way, with a single row per precinct and office with Over Votes and Under Votes as the candidate values.

moonshiner commented 1 year ago

I believe I've got some code that parses the 1438 page Ottowa results. it does now I need to 1) throw out the local race pages; and 2 ) aggregate the pdf data and then 3) merge into CSVs

running some errands will write up the process as I was playing with pyPDF2 and text extraction.

dwillis commented 1 year ago

@moonshiner awesome, thanks! if it's easier to keep the local race results in there, go ahead and do that.

moonshiner commented 1 year ago

The local races are 2/3 of the pages, so I could ignore those as I was running various tests.

I've been putting the pieces in https://github.com/moonshiner/openelections-data-ottowa-mi but documenting the various stages. That's more for me to share with my folks at work example wise.

I parsed and hand tweaked csv examples of the 4 pages of one precinct. Let me know what you think. They are standard tweaks so it can be scripted tweaks.

L:

moonshiner commented 1 year ago

Like I have the code locally on handling precincts, etc. Just getting it straight in my head (why can't I spell?)

dwillis commented 1 year ago

@moonshiner sounds good to me - let me know when you have a CSV we can check out. Might be useful for other counties, too!

moonshiner commented 1 year ago

Okay I've pushed I believe a cleaned up version at https://github.com/moonshiner/openelections-data-ottowa-mi

and the procesed out (per page)

I found it easier in working through the code to use this process

Split a PDF into individual pages

Extract the test of each page (retaining the x/y on the page)

Extract results from each page

The README explains a bit of it. I ran it over the Genesse county results and appeared to behave

Have several things that need clean up and perhaps we should talk extra tallies

Once I give you data that pleasaes you i can combine the code (and comment it a bit more)

dwillis commented 1 year ago

@moonshiner this looks really great, thank you! If it's easy to do, would change the following values:

Undervotes: -> Under Votes Overvotes: -> Over Votes Cast Votes: -> Ballots Cast

And remove other colons.

moonshiner commented 1 year ago

Oh yes. Wanred to chat you up first. Its not the  same as your example. Also this made another onr workLooking ag genessee which is landscape Im hoping the code will be useful down the road. Someone on my team is going to review to confirm others can make senseThe pdfs which cant be parsed intrigue me. From elkins TimSent from my iPhoneOn Jan 12, 2023, at 16:41, Derek Willis @.***> wrote: @moonshiner this looks really great, thank you! If it's easy to do, would change the following values: Undervotes: -> Under Votes Overvotes: -> Over Votes Cast Votes: -> Ballots Cast And remove other colons.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

moonshiner commented 1 year ago

https://github.com/moonshiner/openelections-data-ottowa-mi

This should have tallies for Ottawa and Muskegon completed in the ParsedData/ folder.

dwillis commented 1 year ago

@moonshiner thanks!

moonshiner commented 1 year ago

Two more to wrap things up:

Keweenaw-MI-totals.csv Iosoco-MI-Totals.csv

Hope that helps!

moonshiner commented 1 year ago

did some useful procrastinating tonight. hope these work

https://github.com/moonshiner/openelections-data-ottowa-mi/blob/main/Iron-Results.csv

https://github.com/moonshiner/openelections-data-ottowa-mi/blob/main/Makinac-Results.csv

dwillis commented 1 year ago

@moonshiner thanks! those look really good, just a few minor things I quickly fixed.

moonshiner commented 1 year ago

I aim for perfect but sadly fall short.

Also, after dumping the Livingston PDF data, I realized I can modify my code I used on Ottawa to pull out the LIvingston County data. And as I have a few meetings today....

If/when I'll share the code etc (it may be even more hackier but I'll document the process)

dwillis commented 1 year ago

@moonshiner we all fall short of perfection. thanks for your contribution to the project!

moonshiner commented 1 year ago

https://github.com/moonshiner/openelections-data-ottowa-mi/tree/main/Livingston-MI

Here are the House/StateHouse/StateSenate races for Livingston

I want ot double check them but also want to wrap the others up.

moonshiner commented 1 year ago

I need to do the Straight Party and SoS. Let me wrap those up and do some data validation and will let you know.

moonshiner commented 1 year ago

Those look much better. when you see "Election" in the csv files those are election day votes (vs Absentee) I put the python in there but it was a bit less robust, but that was because the pages came out very identical.