project-open-data / project-open-data.github.io

Open Data Policy — Managing Information as an Asset
https://project-open-data.cio.gov/
Other
1.33k stars 585 forks source link

Guidance and migration pathway for globally unique IDs #592

Open philipashlock opened 7 years ago

philipashlock commented 7 years ago

It was recommended that globally unique values for identifier should be specified way back with #69 and the guidance was updated to make this a recommendation (using a URL) with the update to v1.1, but it was not a strict requirement. After several years of this metadata making its way across the internet ecosystem it's now more clear than ever that this needs to transition to a hard requirement. The case for this was already made pretty sufficiently within #69. This issue is meant to lead both to final guidance on that as a hard requirement, but also to more immediately develop a migration pathway to transition current non-global values for identifier to proper URLs.

One migration option may be to define a convention where a unique ID for each agency is paired with the non-global identifier for the dataset and then have that appended to the end of a URL, potentially using a similar approach as W3ID to help maintain persistence.

For instance, if GSA has a metadata entry with a non global identifier like GSA-2016-01-22-01 then that could be represented as a more global URI like dcat-us:federal/gsa/GSA-2016-01-22-01 but ultimately it would appended on to a URL for a final value of something like https://id.data.gov/dcat-us/federal/gsa/GSA-2016-01-22-01. We'd also need to consider URL encoding as part of this transformation since there aren't any restrictions on the current values.