nvkelso / election-geodata

Precinct shapes (and vote results) for US elections past, present, and future
318 stars 51 forks source link

Standardize use of name and FIPS code in state and county fields #135

Open bahoo opened 6 years ago

bahoo commented 6 years ago

When I do a SELECT DISTINCT state FROM 'precincts-2163' I get a list that is a mix of full state names, and FIPS codes:

01
02
04
Arkansas
06
08
09
10
11
12
Georgia
Hawaii
16
17
18
Iowa
Kansas
Kentucky
22
23
24
25
26
Minnesota
28
(null)
Missouri
30
31
32
33
34
35
36
37
38
39
40
41
Pennsylvania
45
46
47
48
49
50
Virginia
Washington
West Virginia
55
Wyoming
72

The county field is similar. I feel the need to standardize, and certainly see the value in having, potentially, both forms (if we'd want, then — state, state_fips, county, and county_fips).

Any thoughts?

Potential solutions could include:

  1. Hard coding state name or state FIPS code in each command
  2. Doing a SQLite join to a static CSV file with name and FIPs codes; e.g. ogr2ogr -F CSV state_fips.csv render/cb_2013_us_state_20m-2163.shp -dialect SQLite -sql "SELECT name AS name, statefp AS fips FROM 'cb_2013_us_state_20m-2163'"
  3. Similar approach but for county codes.

Happy to run with this and submit a PR, but would love to have more input, first.

Thanks!

nvkelso commented 6 years ago

I’m down with always using FIPS.

On Mar 5, 2018, at 15:05, Jon Culver notifications@github.com wrote:

When I do a SELECT DISTINCT state FROM 'precincts-2163' I get a list that is a mix of full state names, and FIPS codes:

01 02 04 Arkansas 06 08 09 10 11 12 Georgia Hawaii 16 17 18 Iowa Kansas Kentucky 22 23 24 25 26 Minnesota 28 (null) Missouri 30 31 32 33 34 35 36 37 38 39 40 41 Pennsylvania 45 46 47 48 49 50 Virginia Washington West Virginia 55 Wyoming 72 The county field is similar. I feel the need to standardize, and certainly see the value in having, potentially, both forms (if we'd want, then — state, state_fips, county, and county_fips).

Any thoughts?

Potential solutions could include:

Hard coding state name or state FIPS code in each command Doing a SQLite join to a static CSV file with name and FIPs codes; e.g. ogr2ogr -F CSV state_fips.csv render/cb_2013_us_state_20m-2163.shp -dialect SQLite -sql "SELECT name AS name, statefp AS fips FROM 'cb_2013_us_state_20m-2163'" Similar approach but for county codes. Happy to run with this and submit a PR, but would love to have more input, first.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

happyjack27 commented 6 years ago

What I've done is have a hard-coded array of FIPS - abbreviation - state name. This was for personal use. Gor largee distribution I think .csv of the same would be more appropriate

On Mar 5, 2018 5:52 PM, "Nathaniel V. KELSO" notifications@github.com wrote:

I’m down with always using FIPS.

On Mar 5, 2018, at 15:05, Jon Culver notifications@github.com wrote:

When I do a SELECT DISTINCT state FROM 'precincts-2163' I get a list that is a mix of full state names, and FIPS codes:

01 02 04 Arkansas 06 08 09 10 11 12 Georgia Hawaii 16 17 18 Iowa Kansas Kentucky 22 23 24 25 26 Minnesota 28 (null) Missouri 30 31 32 33 34 35 36 37 38 39 40 41 Pennsylvania 45 46 47 48 49 50 Virginia Washington West Virginia 55 Wyoming 72 The county field is similar. I feel the need to standardize, and certainly see the value in having, potentially, both forms (if we'd want, then — state, state_fips, county, and county_fips).

Any thoughts?

Potential solutions could include:

Hard coding state name or state FIPS code in each command Doing a SQLite join to a static CSV file with name and FIPs codes; e.g. ogr2ogr -F CSV state_fips.csv render/cb_2013_us_state_20m-2163.shp -dialect SQLite -sql "SELECT name AS name, statefp AS fips FROM 'cb_2013_us_state_20m-2163'" Similar approach but for county codes. Happy to run with this and submit a PR, but would love to have more input, first.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nvkelso/election-geodata/issues/135#issuecomment-370609753, or mute the thread https://github.com/notifications/unsubscribe-auth/AB7NQe67xMtLEetxUhumkfSNPNizIu_xks5tbc-rgaJpZM4Sd0oH .

migurski commented 6 years ago

👍 to FIPS always.

nvkelso commented 6 years ago

Let’s switch to always FIPS then.

On Mar 5, 2018, at 19:53, Michal Migurski notifications@github.com wrote:

👍 to FIPS always.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

nvkelso commented 6 years ago

The state portion of this was taken care of in https://github.com/nvkelso/election-geodata/pull/146.

sigpwned commented 6 years ago

I've done a little work on the data set and I believe I've got (a) all precincts assigned to a county; and (b) all counties standardized to FIPS and name. What is the best way to contribute this data back? One pull request overall, one pull request per state, ...? Most of the work was done in PostGIS, so I'm not 100% sure what the best way to give the data back will be.

nvkelso commented 6 years ago

Nice! Can you submit a small PR for just one or 2 states to demonstrate the changes? Then we can talk about the rest...

For Postgres... is this a 1 time operation? Ideally it’s something that can be done each makefile build. Like with OGR or Mapshaper?

On Sep 9, 2018, at 14:33, Andy Boothe notifications@github.com wrote:

I've done a little work on the data set and I believe I've got (a) all precincts assigned to a county; and (b) all counties standardized to FIPS and name. What is the best way to contribute this data back? One pull request overall, one pull request per state, ...? Most of the work was done in PostGIS, so I'm not 100% sure what the best way to give the data back will be.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

sigpwned commented 6 years ago

Thanks for the quick response!

Can you submit a small PR for just one or 2 states to demonstrate the changes? Then we can talk about the rest...

Absolutely. I think that's a great approach.

For Postgres... is this a 1 time operation? Ideally it’s something that can be done each makefile build. Like with OGR or Mapshaper?

A lot of this work was by-hand, so I'm afraid it's going to be a bit of a mixed bag, at least the way I've done it. However, I'm much more of a generalist than a GIS expert, so it's also totally possible that someone with more GIS experience could take the work I've done and figure out a way to stitch it into a more build-style process.

I'll try to open a couple of different PRs demonstrating different kind of work; link to those PRs here in a separate comment; and include in that comment the other work that I've done.

Also, THANK YOU for providing a great foundation here! I think making this data generally available is really important work. I hope I can help.

sigpwned commented 6 years ago

I was only able to open one ticket/PR this evening, but I did take the time to learn how to integrate the change into the Makefile. Hopefully that's at least a reasonable prototype for the other changes I have.

The new ticket is #149. The new PR is #150.

Here are the other changes I have. I did the work about a week ago, so it's possible that you've already fixed some of these. I think all this work can be considered part and parcel of this issue. I have organized them as a checklist to show the work I have yet to port over:

I also did some additional work that I think is probably beyond the scope of this issue that I will handle through another issue:

Hopefully those make sense! Are these changes valuable? If so, what do you think is the best way to get these changes into the data set? Ideally, via the Makefile, right?

sigpwned commented 6 years ago

Changes to OR have been made per issue #149 / PR #150.

sigpwned commented 6 years ago

Changes to HI have been made per issue #151 / PR #152.

sigpwned commented 6 years ago

Changes to VA have been made per issue #153 / PR #154.

sigpwned commented 6 years ago

Changes to PA have been made per issue #155 / PR #156.

Note that this change includes only county FIPS standardization and not precinct standardization. PA precincts do not always have names, and when names are present they are not unique within a county (e.g.: fid=274 and fid=298 both have county 107 and precinct Wd South).

How should precinct IDs be generated in this case? FIDs look like a good candidate, but without knowing more about how they were generated it's hard to recommend them.

sigpwned commented 6 years ago

Changes to KS have been made per issue #157 / PR #158.

Note that this change includes only county FIPS standardization and not precinct standardization. KS precincts have many different labeling schemes, several of which have no obvious mapping back to a unique 4-digit numerical code (e.g., "Overland Park 1-01").

How should precinct IDs be handled in this case?

sigpwned commented 6 years ago

Changes to WA have been made per issue #159 / PR #160.

Note that this change includes only county FIPS standardization and not precinct standardization. WA precincts have a very consistent naming scheme and it would be easy to transform the existing names into any one of several good, standard, cross-compatible alternatives. However, it must first be decided what format to adopt.

How should precinct IDs be handled in this case?

It's important to settle this question for WA and the other states above. But you folks are way ahead of me, per #144. 🙂

nvkelso commented 6 years ago

@sigpwned Thanks for your progress on the county part of this issue! Note I changed your text in https://github.com/nvkelso/election-geodata/issues/135#issuecomment-419802026 to use Markdown for the checkbox state :)

sigpwned commented 6 years ago

Thanks for your progress on the county part of this issue!

Thrilled to help! Very glad I can give the data back.

Note I changed your text in #135 (comment) to use Markdown for the checkbox state :)

Hot damn! That's much better. I didn't know you could do that. TIL! Time to go review the markdown docs...

I'll continue the work laid out above and then re-assess current state. Once I think the data is in a state appropriate to closing this ticket, I'll provide some PostGIS queries of the national file as evidence that state and county FIPS codes have been populated appropriately.

sigpwned commented 6 years ago

Currently, this project's model is that each precinct is associated with exactly one county, which (generally) fully encompasses it.

I like that model! It's simple and easy to understand. However, it turns out that Iowa's voter precincts are typically bigger than their counties. (In fact, precincts regularly fully encompass multiple counties!) How bizarre.

How should this be handled? I can think of a few options that will or won't work:

  1. If SQLite supported arrays, I'd be inclined to suggest we change the type of the county field from CHAR(3) to ARRAY(CHAR(3)), but SQLite does not appear to support arrays.
  2. Another (far hackier) option would be to make the county field a comma-separated list of county FIPS codes, but SQLite does not appear to support a SPLIT function either, so that would be problematic to work with.
  3. Another good option would be to force each state's GPKG to have two tables -- state and (for example) precinctcounty -- which would encode a one-to-many relationship between a precinct and any counties.
  4. Another option would be just to leave Iowa's county field blank because it doesn't fit the model, and incldue a precinctcounty table in Iowa only.

Does anyone have an opinion on how to handle? I'm leaning towards 4 above.

I'll check to see if the other states without counties called out above -- CA, DE, and NM -- have the same problem now.

EDIT -- I'm an idiot. The Iowa county file doesn't actually contain counties; rather, it appears to contain "townships." Counties in Iowa appear to be made up of multiple townships. I am re-evaluating now.

EDIT -- Update for Iowa has been pushed, per issue #161 / PR #162. All later geo-based updates will depend on this PR because it includes an update to the Dockerfile that enables spatial functions to be used when manipulating GPKG files.

nvkelso commented 6 years ago

I'm fine with option 4 to skip it... or to put the "majority" county as a connivence there.

(The same can also be said for postal codes...)

On Fri, Sep 21, 2018 at 10:02 AM Andy Boothe notifications@github.com wrote:

Currently, this project's model is that each precinct is associated with exactly one county, which (generally) fully encompasses it.

I like that model! It's simple and easy to understand. However, it turns out that Iowa's voter precincts are typically bigger than their counties. (In fact, precincts regularly fully encompass multiple counties!) How bizarre.

How should this be handled? I can think of a few options that will or won't work:

  1. If SQLite supported arrays, I'd be inclined to suggest we change the type of the county field from CHAR(3) to ARRAY(CHAR(3)), but SQLite does not appear to support arrays.
  2. Another (far hackier) option would be to make the county field a comma-separated list of county FIPS codes, but SQLite does not appear to support a SPLIT function either, so that would be problematic to work with.
  3. Another good option would be to force each state's GPKG to have two tables -- state and (for example) precinctcounty -- which would encode a one-to-many relationship between a precinct and any counties.
  4. Another option would be just to leave Iowa's county field blank because it doesn't fit the model, and incldue a precinctcounty table in Iowa only.

Does anyone have an opinion on how to handle? I'm leaning towards 4 above.

I'll check to see if the other states without counties called out above -- CA, DE, and NM -- have the same problem now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nvkelso/election-geodata/issues/135#issuecomment-423606077, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0EO_EVycBcWkFfIsNbFk_A9b7lTPLiks5udRungaJpZM4Sd0oH .

sigpwned commented 6 years ago

I'm fine with option 4 to skip it... or to put the "majority" county as a connivence there.

Fortunately, I was wrong about the Iowa regions! The file labeled "counties" actually contained "townships" that make up "counties" in Iowa. Once I merged the townships into counties, everything turned out fine. I just commented too early. :)

sigpwned commented 5 years ago

OK, looks like all the PRs are open. Once they're all merged, I'll build the nation file and check for counties that aren't populated, or are populated with something other than a valid county FIPS code. I'll rinse and repeat until the counties are all done, or we've fixed as much as makes sense. I'll keep folks updated here. Once the counties are fully populated, I'll recommend closing this issue!

EDIT: Looks like there are 4 more states with county FIPS issues. Fortunately, these issues can be handled with simple CASE statements, so it should be possible to integrate these changes without waiting on @migurski's thoughts on the Docker updates.

EDIT: I've attempted to proactively check the county labels as if all the above PRs had been accepted and found the 4 issues above. All county labels are now numeric, which is great! However, some county labels are not exactly 3 chars or 5 chars, so there are some zero padding issues. I'll take a peek at those next.

EDIT: There are several states that use un-padded FIPS codes:

nvkelso commented 5 years ago

Woot! Let’s see what Migurski says :)

I merged in Nevada data last night, and have Ohio data to add this week then I’m all caught up.

On Sep 23, 2018, at 08:55, Andy Boothe notifications@github.com wrote:

OK, looks like all the PRs are open. Once they're all merged, I'll build the nation file and check for counties that aren't populated, or are populated with something other than a valid county FIPS code. I'll rinse and repeat until the counties are all done, or we've fixed as much as makes sense. I'll keep folks updated here. Once the counties are fully populated, I'll recommend closing this issue!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

sigpwned commented 5 years ago

The above PRs are now open. Georgia FIPS codes did not need padding because the "unpadded" counties are in fact missing. IMO, this can be handled separately, as it's a different issue. Once the above PRs are pulled, I think this issue is ready to close!

nvkelso commented 5 years ago

Party time! Thanks for all your work on this :)

On Sep 23, 2018, at 15:54, Andy Boothe notifications@github.com wrote:

The above PRs are now open. Georgia FIPS codes did not need padding because the "unpadded" counties are in fact missing. IMO, this can be handled separately, as it's a different issue. Once the above PRs are pulled, I think this issue is ready to close!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

sigpwned commented 5 years ago

Thrilled to help! I think making this kind of data generally available is important. And I'm looking forward to doing some fun analyses with it, too. :smile:

sigpwned commented 5 years ago

Having slept on it, there's one more thing we might want to handle in this issue. While all precincts now use state and county FIPS codes, some precincts use $state_fips$county_fips as their county label, and some use just $county_fips.

If we standardize to one of the above for county IDs, then I struggle to think of any other work that could be logically assigned to this issue.

Regarding which way to lean -- $state_fips$county_fips or $county_fips -- because of the lack of consistency on precinct labels, I've come to believe that precinct labels should be left standalone and unstandardized. (That should be handled in #144, not here.) If precinct labels are going to be unstandardized, then I think county labels should probably be standalone too -- just $county_fips. Just my two cents.

Whichever way it goes, I think it makes sense to handle that in just one PR, rather than one per state.

sigpwned commented 5 years ago

Whoo hoo! PR merge party last night!

The last fix I can see that this issue may require is standardizing county FIPS codes as $state_fips$county_fips vs $county_fips for every state. Currently we have a mix. We can handle that here, or in a separate issue. If we want to handle that in a separate issue, we can close this one!

bahoo commented 5 years ago

I'd suggest let's do it here, since most of the discussion is already here? 🤷‍♂️ Not at all strongly held, though.