opencivicdata / ocd-division-ids

Open Civic Data Division IDs definition & canonical repository
Other
158 stars 93 forks source link

How do you make UIDs in ocd-id's consistent? #300

Open jamesa opened 2 years ago

jamesa commented 2 years ago

Hi, I'm a beginner just getting familiar with your project. I'm mostly interested in your US data, and I had a question on the usage of UIDs in OCD ID's.

I understand this repository holds the canonical OCD ID's for many jurisdictions, but wasn't clear on if there's guidance on whether a bill, or event, could have a canonical ID.

Example For instance, from the datamade API referenced in [your docs](https://ocd-api-documentation.readthedocs.io/en/latest/endpoints.html), I can get this response from https://ocd.datamade.us/bills/?page=3 (currently) ```json { "results": [ { "classification": [ "ordinance" ], "id": "ocd-bill/45b448a4-86f0-4fae-8311-6cf958cf1557", "title": "Grant(s) of privilege in public way for Dream, Inc.", "subject": [ "Grants of Privilege" ], "identifier": "O2020-3422", "from_organization": { "jurisdiction": { "id": "ocd-jurisdiction/country:us/state:il/place:chicago/government", "name": "Chicago City Government" }, "id": "ocd-organization/ef168607-9135-4177-ad8e-c1f7a4806c3a", "name": "Chicago City Council" }, "updated_at": "2020-07-22T23:51:15.477432+00:00" }, [...] ``` Using that identifier `O2020-3422` I can find [that bill](https://chicago.legistar.com/LegislationDetail.aspx?ID=4572237&GUID=136E38D3-2408-44FF-B5E2-AE58CA4D6D80&Options=Advanced&Search=) in the Chicago Legistar. I searched around within Legistar but I couldn't find something that matched the `45b448a4-86f0-4fae-8311-6cf958cf1557` ID to use as a reference.

If I were writing my own scraper, how would I ensure that my representation of the bill in this example, in terms of its generated OCD ID, remains consistent with the one returned from the datamade API? How do I generate that same ID independently of that API?

Along the same lines, if I were to publish some event not tracked by that API, but datamade later scraped the same event, I would want to make sure we ended up with the same generated ID.

The same goes for every other data type that uses UID's (events, organizations, people, votes). Is this up for each implementation to decide, if there's not a canonical ID?

I'm probably missing some behavior that determines this in one of the scraper repos, but I'd appreciate any guidance you can provide on this. Thank you!

showerst commented 2 years ago

@jamesa -- I just stumbled on this, and FYI those IDs are managed by another project. They're generated by scrapers that use opencivicdata/python-opencivicdata as a base. The UID code is in the models, heres the bill code as you can see it's currently using a UUID so there's no good way to generate consistent entries.