opencivicdata / python-opencivicdata

python utilities for Open Civic Data
BSD 3-Clause "New" or "Revised" License
34 stars 27 forks source link

Division model proposals #110

Open hobbes7878 opened 6 years ago

hobbes7878 commented 6 years ago

Hi,

New to this project and admiring the work. We're mirroring parts of this schema in our own elections rig at Politico with the hope we'll eventually be able to adopt the full spec.

Had some some proposals for adding to the Division model to start. Happy to submit a PR once the concepts make sense.

In the docs there is a concept of boundaries. It might make sense to add a Boundary model that foreign keys to Division. A Boundary object would have starting and ending effective dates and a flag representing whether that boundary is the current geographic representation of the Division, letting you have multiple historical representations, useful for modeling congressional districts. It would have a JSONField for keeping Geo/topoJSON directly on the model.

Wondering if we can handle the relationships in the DivisionManager through more explicit models: Many Divisions are hierarchical, country > state > county > precinct. Could a Division have a self-referencing foreign key to a parent Division. Some relationships aren't hierarchical, like congressional district <> county/precinct. Could Divisions capture that relationship with a self-referencing ManyToMany field of intersecting divisions. Making it a m2m through a another model would let you hang the actual proportion of the intersection, which is useful for apportioning things like census counts.

Could the many subtype fields be more easily handled with a JSONField?

Could there also be a higher level abstraction on Division that represents the level of that division. Examples might be a state, a county, a precinct, etc. The major benefit of modeling those levels is it lets us query down from all of a type of Division. Give me the boundaries of all states, for example. This might need to be raised on the docs repo, just let me know.

Thanks. Looking forward to contributing to this.

jsfenfen commented 6 years ago

In general this repo isn't optimized for elections--someone else can speak to the specifics--but worth pointing out that there's a different spec for elections that folks have been working on here.

Edited to point out that there's been progress on this here: https://github.com/opencivicdata/python-opencivicdata/tree/master/opencivicdata/elections --maybe that's what you were referring to? Maybe ping @gordonje ?

hobbes7878 commented 6 years ago

Ah, cool. Thanks for those links, @jsfenfen.

Just to clarify, too, I'm thinking of these proposals separate from elections. Just mentioned it in the context of how we're approaching this app, but the proposals here are meant to be generic.

gordonje commented 6 years ago

@hobbes7878 👋

Sounds like we're talking about a significant departure for the the current OCD divisions specification so, yeah, I think it should be hashed out in the docs repo as an OCD enhancement proposal (an rst file added here).

Once there's enough buy-in on the new spec, then it can be implemented in this repo.

gordonje commented 6 years ago

Also...I don't want to speak out of turn so maybe someone with a longer history of involvement ( cc @jamesturk @jpmckinney @fgregg ) might want to weigh in on how well this proposal would fit into the scope of OCD.

My attempted summary of what's being proposed:

  1. Modeling historic representations of division boundaries
  2. Modeling relationships between divisions (which are not always hierarchical)
jpmckinney commented 6 years ago

Hi @hobbes7878: To my knowledge, Open Civic Data still uses https://github.com/opennorth/represent-boundaries for its boundaries (though I think only in imago), and defers all boundary-related questions to that package.

Within that package, there is a BoundarySet model (e.g. "New York State Assembly districts") and a Boundary model (e.g. "District 1"). A BoundarySet collects a group of Boundary objects. Both models have a start_date ("The date from which the set's boundaries are in effect") and an end_date ("The date until which the set's boundaries are in effect."). The Boundary model holds the geospatial data. There's no need for an 'in-effect' boolean, because that is implied by the date fields. (If you're worried about optimized performance, just put indexes on your date columns.) There is no foreign key between Boundary and Division (because the packages were developed separately), but we can add that.

Intersecting divisions are determinable using geospatial queries. I'm not sure of the value of caching this information – it seems like there is a high risk of it going stale, and geospatial queries are fast enough for all uses cases that I can imagine for this information. The use cases that need a cache may be too narrow for a general library like this one, but that can be discussed.

I would expect that the hierarchy of divisions is implied by the structure of the OCD ID (assuming you are using those). So, to get the parent of ocd-division/country:ca/province:ab/ed:1 you just look up ocd-division/country:ca/province:ab. If you need to write a join query involving parent divisions, then a foreign key will make it easier than having to perform string operations in SQL, but I don't know a use case for that. Can you describe one?

Popolo (a spec on which parts of OCD are based) has an Area class (which corresponds to 'Boundary' here), and there had been earlier discussion on how to extend it: https://github.com/popolo-project/popolo-spec/issues/59.