opencivicdata / ocd-division-ids

Open Civic Data Division IDs definition & canonical repository
Other
157 stars 93 forks source link

Update/establish best practices for OCD-ID types across countries #170

Open jdmgoogle opened 5 years ago

jdmgoogle commented 5 years ago

This issue spun out of the discussion on PR #168.

Background links: Creating new OCD-IDs Division identifiers

The current OCD-ID documentation establishes one canonical identifier type -- country -- but otherwise gives latitude to within-country maintainers to define and enforce types appropriate for their jurisdiction. E.g., the first-level administrative type in the U.S. is a state, in Portugal it is a district, and in Germany it is a land.

The current situation gives a lot of flexibility and discretion to these within-country maintainer, but can place a significant burden on consumers of the identifiers to figure out what common types are across countries; e.g., is a district a first-level administrative division, or a sub-city legislative division?

Previous ad-hoc attempts to address this problem (e.g., PR #148) created identifiers which used the in-country types as aliases of identifiers that were more American in origin. E.g.,

Canonical: ocd-division/country:de/state:bb Alias: ocd-division/country:de/land:bb

This was an attempt to balance the needs of publishers (using in-country types and terminology) and consumers (using types they had already seen). The discussion in PR #168 came to the conclusion that this was not desirable, or at least should go through a more thorough review before being implemented at scale. The options discussed were:

  1. Codify the ad-hoc approach.
  2. Reverse the current ad-hoc approach, in that the alias should be from the local term to the across-country term.
  3. Follow the practices of projects like GeoNames, etc., which use country-agnostic terms like adm1, adm2, etc, and not use the US-centric terms state and cd for a global specification.
  4. Add a new file to the spec, e.g. country-de-types.csv, with the columns local-type and standard-type, with rows like land,adm1 and wahlkreis,constituency.

Discuss. :)

jpmckinney commented 5 years ago

Using the local terms will always make sense for local users.

A challenge only arises when a user (1) wants to use OCDIDs from multiple jurisdictions and (2) needs to know whether a division type in one jurisdiction corresponds to a division type in another jurisdiction. (@jdmgoogle Can you provide a specific use case?)

Now, the problem of deciding whether two divisions in two jurisdictions are of the same type is not a solved problem – by anyone, anywhere.

GeoNames and others all likely have other issues as well, where, most likely, a maintainer had to decide what hierarchy to follow, which might not match reality.

In other words, trying to establish a worldwide crosswalk of division types is likely a fool's errand.

That said, I think it's fine for users to create aliases to the canonical IDs, where the aliases would attempt to organize the world into a single hierarchy of division types.

This repo, however, for the canonical IDs, should use division types that have a local meaning to better match reality.

jpmckinney commented 4 years ago

Noting comments in https://github.com/opencivicdata/ocd-division-ids/issues/184#issuecomment-553605766 and below about a desire for a metadata file format for defining OCD types.