opengeospatial / ogcapi-records

An open standard for the discovery of geospatial resources on the Web.
https://ogcapi.ogc.org/records
Other
56 stars 26 forks source link

Identify type of id in externalId? #350

Open m-mohr opened 3 months ago

m-mohr commented 3 months ago

In Sentinel-2 for example a lot of "external" IDs are exposed through the metadata, e.g.

If I just place all of them into externalIds, it's not clear what type of IDs are listed:

[123, 321, 456, 654] (for example)

Thus I'm wondering, would something like the following or anything similar would make sense:

"externalIds": {
  "tile_id": "123",
  "datatake_id": "321",
  "datastrip_id": "456",
  "granule_id": "654"
}
m-mohr commented 3 months ago

Any take on this @pvretano or @tomkralidis?

I'd like to create a STAC extension that uses the object approach because we need to know the ID name and I'm wondering whether I should just create a separate field that is not externalIds or potentially try to align between records and STAC. :-)

tomkralidis commented 3 months ago

@m-mohr I think properties.externalIds is an array of objects, for example:

  "externalIds": [{
    "scheme": "https://doi.org",
    "value": "10.14287/10000001"
  }, {
    "scheme": "https://handle.net",
    "value": "2381/12775"
  }, {
    "scheme": "https://arks.org",
    "value": "ark:/13030/tf5p30086k"
  }]

A STAC extension can extend the above accordingly (where value is the only required property).

m-mohr commented 3 months ago

Aha! I guess that works, too. I was probably mislead by the example in the README. Thanks!

tomkralidis commented 3 months ago

Thanks @m-mohr, good catch! Associated fix in #352

pvretano commented 3 months ago

Actually there is a problem here.

My intent was to make externalId an array of strings (like the README example) so that, using the externalId query parameter, one would easily search for a record by its external id.

Somewhere along the way externaId got changed to an object and I missed that or didn't pay close enough attention when the change was made. Now, however I see a potential problem. How do you search for a record by its external id using the externalId query parameter? Do you simple pass a list of JSON strings like this:

.../collections/{catalogId}/items?externalId={"value":"id1"},{"scheme":"https://...","value":"id2"},....

That seems wildly inconvenient ... no? It is so much easier and more convenient to say:

.../collections/{catalogId}/items?externalId=id1,id2

Would it be of value to come up with some pattern like [scheme:]value in order to preserve the ease of use of the externalId query parameters ... or is everone OK simple using JSON strings?

m-mohr commented 3 months ago

@pvretano Well, the same problem more or less exists for themes. I'd assume you should be able to search through both.

I always thought queryables can improve the search behavior. For example, you expose for example externalIds as string queryables and internally you just search in the value field of the externalIds object`.

I guess something similar could be specified for the additional query parameter? I mean you also don't search for exact value using bbox and other query parameters. I'm ignoring the scheme here, but I'm not sure how much overlap there would be in practice.

rob-metalinkage commented 1 month ago

perhaps we could enforce the use of CURIES - of the form {scheme}:{id}

we could add a JSON-LD @context for namespaces - without having to re-invent the wheel

or if a besoke JSON-native solution is a must have, provide a section (in Common since its a super common pattern we'll need everywhere)

"schemes": [
 {  "token": "my-scheme",
     "namespace": "http://something/" ,  # or "id" if the ids are not in the form that can be uplifted to URIs
      "resolver": { <extensible description of a resolver },
 ... } ,
 .... ] 

possibly we'd need to distinguish between a scheme identifier and a namespace

this would:

  1. resolve @pvretano's concern,
  2. be natively JSON-LD compatible (allowing automated URI where a context is provided - other cases wouldnt be CURIES but could have the same basic template - clients would need to do exactly the same amount of work to treat them as strings
  3. allow an arbitrary level of detail how to interpret identifiers of any lexical form with resolving mechanism metadata to be defined by implementation profiles.

to keep it simple - just enforce proper CURIES and use of a JSON-LD context.

m-mohr commented 1 week ago

Please take this issue as a comment for https://www.ogc.org/requests/ogc-requests-public-comment-on-ogc-api-records-part-1-core/

pvretano commented 1 week ago

@m-mohr roger that!