radiantearth / stam-spec

SpatioTemporal Asset Metadata specification - defining core metadata fields for searching imagery & other geo assets
Apache License 2.0
7 stars 4 forks source link

Getting ID right #12

Open cholmes opened 7 years ago

cholmes commented 7 years ago

The goal for these imagery files is for every id to be truly unique. The initial OIN spec had a uuid field, and seemed to populate it with the URL to the file. This should be unique for files stored online. But it might work less well with imagery powered by API's that generate data on the fly.

Another idea is to use https://www.ietf.org/rfc/rfc4122.txt

Imagery providers do spend good effort on getting unique id's for their imagery. So ideally there would be a way to leverage unique ID's that may be set by others, but also have a mechanism to generate unique new ids.

We should also think through caching and derived data workflows. If an image is hosted on two clouds but represent the same image do they share id's? Or have related id's? And should there be a relationship in id's of images that are derived from a source? (likely not on the latter, but is good to think through).

tombh commented 7 years ago

It's a little confusing that OIN uses the term uuid, when as far as I understand it refers specifically to the RFC you link. An OIN UUID is the image's URI, so an image can't change its location without changing its fundamental identity, which is simply not the right thing to do.

This topic has crossed my mind a few times during my work on OAM/OIN and I always just think we need real UUIDs and a separate permalink(s) field. And to answer your question about different copies of the same image having the same ID, yes I'd consider that essential.

matthewhanson commented 7 years ago

@cholmes @tombh It's not clear to me here how multiple files fit into this spec yet. Is the idea that metadata applies to a single file? A single file can certainly have a unique ID, but the more typical case is that a scene is made up of multiple data files as well as potential ancillary files and they share an ID. In the NASA world, this is referred to as a "granule", because the data all goes together. It could be made up of multiple bands, qc bands, additional metadata, thumbnails, etc.

If an image is hosted in multiple locations they should have the same granule ID to refer to the entire collection of files. Also note that whether something is a single file or multiple files will be based on distribution, so I think it's even more important that the ID refers to the entire granule.

As far as having more specific requirements on ID I don't think that it will work. An ID should be assigned by the provider, and they should have control over that naming convention. A UUID may be appropriate for some application, but to for data provenance it's important that the "granule" ID can be used to go back to the provider.