Closed karpet closed 3 years ago
Here's an example for Sheridan county.
ksvotes=# select name from precincts where county_id =90 order by name;
name
----------------------
Adell
Adell Township
Bloomfield
Bloomfield Township
Bow Creek Township
Bowcreek
East Kenneth
East Saline
Hoxie - East
Hoxie - West
Kenneth - East
Kenneth - West
Logan
Logan Township
Parnell
Parnell Township
Prairie Dog
Prairie Dog Township
Saline, East
Saline, West
Sheridan
Sheridan Township
Solomon
Solomon Township
Springbrook
Springbrook Township
Union
Union Township
Valley
Valley Township
West Kenneth
West Saline
(32 rows)
ksvotes=# select name from census_precincts where county_id = 90 order by name;
name
----------------------
Adell Township
Bloomfield Township
Bow Creek Township
East Kenneth
East Saline
Logan Township
Parnell Township
Prairie Dog Township
Sheridan Township
Solomon Township
Springbrook Township
Union Township
Valley Township
West Kenneth
West Saline
(15 rows)
Okay I think I'm tracking. Sheridan county data in General Election Tally 11-08-16 has "Adell" but the Census precincts list with FIPS code as "Adell Township" and your solution is to build a table that makes those equal in the eyes of the database.
This is a file we did last cycle to get Votebuilder precincts matched up with 2012 election results, and votebuilder precincts are derived from the state voter file precinct fields.
Sheridan County is even more cloudy in the voter file: Precinct AD3, Precinct BL4, etc
Kenneth Township, for example, is both the Township and the city of Hoxie within. Votes are only reported at township level, but people are registered in city of Hoxie for, presumabley, local elections which those who live outside of the city limits (but still in Kenneth Twp) cannot participate in. And then it's divided between East and West for further fun.
Yes, exactly.
I called and talked to the Sheridan county clerk this afternoon, just to clarify for myself what I was seeing in the spreadsheet they distribute vs what's in the SOS spreadsheet(s). Hoxie (East|West) is local elections only, but gets reported in OpenElections for state leg races as well. It depends on where/how OpenElections is parsing its data.
I have constructed the data model now to accommodate this by having a CensusPrecinct
map to one or more Precinct
records, so that votes are tabulated (summed) according to census tracts ("official" SOS precincts) even if they are reported at "unofficial" levels like Hoxie. This gives us a more apples-to-apples comparison with census and voter file data.
Thanks for the combined precinct data xls - I will use that to help build a mapping from the voter file as well.
I want to standardize on a single authoritative name for each census tract, so I'm going to keep using the census tract name and mapping everything to that from voter file, election results, etc.
I've started manually mapping names that require it, here: https://github.com/statedemocrats/ks-votes/blob/master/db/douglas-county-precincts-2016.csv
Generally, I'm using the 2010 Census name as authoritative, since the Census names work for 4 out of 10 years, and then in the intervening years counties may re-parcel based on their own logic.
I've spent several days on this now and want to document the working solution.
When the database is bootstrapped, before any results are entered, the 2012 (2010) Census Tracts are used to seed the Precincts
table. So the baseline is one CensusTract
== one Precinct
.
Then we immediately start to add aliases, and precinct-to-censustract mappings, based on the changes counties made after 2012.
There are 2 tables for allowing for the mess that is precinct names:
PrecinctAliases
table is used to map alternate names for the same precinct. So for example, Wyandotte County Kansas City Ward 1 Precinct 01
might be reported in an election result as KC 1-1
. The KC 1-1
is added as a PrecinctAlias
. Another example is where a Kansas precinct was subdivided to allow for the wonky Legislative boundaries. So, we alias KC 9-16
to Kansas City Ward 11 Precinct 13 S5
(via CensusTract 120080).CensusPrecincts
table is used to create many-to-many relationship between precincts and census tracts. For example, if in 2016 a single precinct was carved into multiple precincts, so that each new precinct relates to a single census tract, CensusPrecinct
records are created for each relationship. A more complex example would be if 2 (or more) CensusTract
records were split into more than 2 Precincts
so that a single Precinct
might be split across multiple tracts, then CensusPrecincts
can be created to reflect the multiple relationships.While PrecinctAlias
records help normalize results to a single precinct known by multiple names, the CensusPrecinct
mostly just flags a set of results as "weird" so that we know that comparing the results across multiple years/elections is probably not an apples-for-apples comparison. Particularly for urban areas with rapidly changing development/population, comparison across multiple years will require manual inspection and some finesse.
I'm currently focused on Shawnee, Sedgwick, Douglas, Johnson and Wyandotte counties for mapping tracts to precincts, since those are the most populous counties and have had the most boundary churn.
Sedgwick county has proven the most troublesome in terms of name re-use and correlation. I have made several different approaches to detecting matches and still have about 80 or so from 2012 that I cannot determine w/o more manual work. The master file is here: https://github.com/statedemocrats/ks-votes/blob/master/db/sedgwick-county-precincts-2016.csv and the rows that need attention are those with empty 2nd and 3rd columns.
Having looked primarily at 2012 and 2016 before this week, I started parsing 2014 results this week. These are the most fraught because we do not have quality SOS sources for 2014, but are instead relying on what openelections has (which seems to be mostly from counties directly).
So 2014 is a mess across the board.
I have started using a couple of Ruby gems (libraries) called fuzzy_match and fuzzy_tools to start applying some natural language processing techniques to the precinct names. Between the 2 of them I have managed to cut the number of "orphans" (precinct names with no known corresponding census tract) by 50% for 2014. That still leaves hundreds of names to manually curate though, so I may need to figure out some way to crowdsource that, given how few hours I have to spend on this. As before, I am starting with the 5 most populous counties first.
Here's an example from Sedgwick county:
W 15 P 02 & W15 P04 -> Topeka Ward 15 Precinct 02
I was able to find federal results for 2014 from the SOS site and use those to at least map VTD and precinct names for that year, which helps somewhat.
Shawnee and Sedgwick counties continue to be the most problematic in terms of name re-use since the census. I've managed to zap most of the Shawnee issues. Will turn my attention to Sedgwick et al next week.
Also this week started on some reporting for all this data, so we can start to visualize some of the results on the map. This will take some design thinking since there is so much data.
An example of the results visualization I got done today.
Precinct names are a mess.
There's official census.gov names from 2012 via https://github.com/nvkelso/election-geodata/blob/master/data/20-kansas/statewide/2012/kansas-state-voting-precincts-2012.geojson but those do not correlate for names as reported via county clerk data in openelections.
I think we need a combination of:
precinct_aliases
lookup table.