opengeospatial / ogcapi-common

OGC API - Common provides those elements shared by most or all of the OGC API standards to ensure consistency across the family.
https://ogcapi.ogc.org/common
Other
45 stars 13 forks source link

Hierarchical Collections #298

Open jerstlouis opened 2 years ago

jerstlouis commented 2 years ago

In our OGC API server and client, we have implemented support for hierarchical collections to facilitate organizing a large number of collections and facilitating discovery by drilling down to the collection of interest.

We would welcome TIEs with other client or server to validate this as a potential conformance class for an extension to Common / Geospatial Data aka Collections.

The requirements are two-fold:

A permission is also granted for the HTML representation of /collections to list only the top-level collections.

A great use case for hierarchical collections is to offer access mechanisms (e.g. Features or TileSets) both for individual FeatureTypes, as well as for collections made up of multiple FeatureTypes (or multi-layer TileSets). e.g. we have multi-layer tilesets at https://maps.ecere.com/ogcapi/collections/Daraa/tiles and single-layer tilesets at https://maps.ecere.com/ogcapi/collections/Daraa:AgricultureSrf/tiles . This would also apply for Features (but multi-feature types collections are not yet supported on our server), especially with JSON-FG which allows declaring feature types.

Another example for maps:

https://maps.ecere.com/ogcapi/collections/NaturalEarth:physical:bathymetry/map https://maps.ecere.com/ogcapi/collections/NaturalEarth:physical:bathymetry:ne_10m_bathymetry_J_1000/map

Original discussion of this topic is in https://github.com/opengeospatial/ogcapi-common/issues/11 .

ghobona commented 2 years ago

@aaime Part of the discussion was about the potential impact on namespace prefixes of using a colon as a separator. Since GeoServer supports the use of namespace-qualified names, perhaps you could comment on the proposal?

ghobona commented 2 years ago

@arnevogt I wonder if T17-API-D165 could be easily configured by editing backend_configuration.json to demonstrate the Hierarchical Collections concept?

Cc: @LieberJosh

tomkralidis commented 2 years ago

@jerstlouis I like the colon-delimited hierarchy for collections, and +1 for having a server declared delimiter. I wonder whether this would be a conformance class and then an added property to a given /collections response?

jerstlouis commented 2 years ago

@tomkralidis Yes something like "collectionIDHierarchySeparator" : ":" would make sense. It would probably be useful to that property at both /collections as well as at /collections/{collectionId} responses.

Hierarchical Collections would be a a conformance class, yes. Meaning two things: using a hierarchy separator, and adding listing of children collections to parent /collections/{collectionId} responses.

Any chance we could eventually see support in PygeoAPI? :)

rggibb commented 2 years ago

DGGS could of course also use this hierarchy notation for its hierarchy of ZoneClasses, ie the levels.

aaime commented 2 years ago

@ghobona in GeoServer we indeed use ":" for namespacing, its usage is opaque to clients, it's just part of the identifier. Administrators can have an understanding of it, in the form of "workspace:localName".

Workspaces are non hierarchical, unordered containers, originally designed to allow setting a common namespace URI for all feature types in the workspace (for ease of WFS setup). In time workspaces have become a lightweight filtering mechanism too, and a way to get rid of the prefix. Compare: https://gs-main.geosolutionsgroup.com/geoserver/ogc/features/collections?f=text%2Fhtml with: https://gs-main.geosolutionsgroup.com/geoserver/tiger/ogc/features/collections?f=text%2Fhtml The second only has collections in the "tiger" workspace, and prefixes have been stripped. For context, the landing page of the features service is at "https://gs-main.geosolutionsgroup.com/geoserver/ogc/features", the workspace prefix has to go between "geoserver" and "ogc". These are also known as a "workspace specific service", from them, it's not possible to access collections belonging to other workspaces.

However, to support WMS hierarchical capability document, we also have another concept: a layer group. It's a WMS specific concept, mind, does not exist anywhere else in GeoServer. A layer group is a hierarchical, ordered container, that can contain layers and other groups. If requested directly by the users, it will return all layers defined in it. A layer group can be part of a workspace (but can also be "global", not contained in any workspace).

Say that in GeoServer we have a layer group contained in a workspace (sf), that contains another layer group, and we are using global services (so, prefixes are still there). Of course we cannot use : as the separator, let's imagine we use ) as the separator, then we could be looking at a URL as follows:

https://gs-main.geosolutionsgroup.com/geoserver/ogc/features/collections/sf:spearfish)sf:subgroup)sf:arcsites/items

while if we access a workspace specific service, we'd use:

https://gs-main.geosolutionsgroup.com/geoserver/ogc/features/collections/spearfish)subgroup)arcsites/items

Seems it would work... however, it really kills me to see special characters being used to represent a hierarchy, when the URL structure itself is hierarchical. I realize that conflicts are possible, because we have sub-resource popping under the collection one every other day (items, tiles, coverage, map are already reserved, to name a few, more are incoming).

An approach that I have not seen in use would be to just have a "collections" resource under the collection, representing the nested collections. The path would become:

https://gs-main.geosolutionsgroup.com/geoserver/ogc/features/collections/spearfish/collections/subgroup/collections/arcsites/items

Does not look as weird as the above path, but it's longer. Even just reserving "c" as path element, would make it use 3 chars instead of one, e.g:

https://gs-main.geosolutionsgroup.com/geoserver/ogc/features/collections/spearfish/c/subgroup/c/arcsites/items

Another consideration is indeed... length. Whatever proposal we are looking at, the structure ends up represented in the path, whose length is limited, and already has other hungry competitors for it (e.g., a filter CQL expression, a polygon geometry used for spatial filtering in some services). WMS did not carry around this issue, the capabilities document had a hierarchical structure, but each layer could be invoked directly by its name, without calling every parent along with it. Another advantage, is that it allowed for layers to be shared in multiple sub-trees. This approach having unique names in the service, but leaves plenty of space for other parameters. Something like could be implemented by having a tree like structure in the "collections" resource, to show what relationships are there, and leaving basic clients just follow the links in the "links" array without understanding the eventual relationships.

cportele commented 2 years ago

"Dataset" is a key context In the W3C Data on the Web Best Practices when sharing data on the web. The examples that I have seen seem to mainly share multiple datasets via a single API. Any extension for hierarchical collections that allows this should clarify which resources are a considered a dataset by the data publisher and which are, for example, subsets. This could be through another member in the Collection resource that clarifies the type of collection.

Personally, I think it is clearer and cleaner to share multiple datasets via separate APIs (which can then also evolve and be versioned separately) and have a kind of super landing page on top of them.

That said, I also see cases where it can be intuitive to users to present the data of a single dataset in a collection hierarchy with a depth > 1.

I do not see any need for a special character requirement, even if we flatten all the collections in the API (ie. only have /collections/{collectionId}) and the parent/child relationships are only expressed through expressing the relationship in the Collection resource. Concatenating node ids along the path separated by a reserved cahracter, if used, would only be a convention of a tool, but I do not see a need to standardize this. And clients should not be required to parse collectionIds.

jerstlouis commented 2 years ago

@cportele Agreed, it would be nice to have something like "isDataSet" : "true" to indicate a dataset.

I do not see any need for a special character requirement, even if we flatten all the collections in the API (ie. only have /collections/{collectionId}) and the parent/child relationships are only expressed through expressing the relationship in the Collection resource. Concatenating node ids along the path separated by a reserved cahracter, if used, would only be a convention of a tool, but I do not see a need to standardize this. And clients should not be required to parse collectionIds

Well, it could be a convention + explicit relationships like a "parent" property. But the separator approach had the benefit of being considerably lighter, e.g. "parent" : "NaturalEarth:physical:bathymetry" for every child of bathymetry, which would always repeat the same information already contained within the convention (and a use case for this is thousands of collections, so that is a considerable advantage). I also think it would be confusing for the user (in web browser especially) if not all servers use a delimiter in collection IDs, and the hierarchy isn't made obvious in the ID.

However I would prefer to standardize something rather than nothing, so something like a "parent" property + collections listing in parent collections as well would be a great step forward.

ghobona commented 2 years ago

If using a property to identify the relationship, the following options are relevant:

jerstlouis commented 2 years ago

To re-iterate my latest proposal, revised to address @cportele 's and others' concerns of using a particular delimiter like : and having to figure out relationships implied from identifiers:

cportele commented 2 years ago

Before we invent new collection properties we should check, if we can leverage existing conventions, in particular link relation types.

As Gobe has pointed out, we could use up to reference the parent collection using a link. For example:

"links": [ ..., { "href": "../the_parent", "rel": "up", "title": "..." } ]

And we could use type to identify resources that are datasets (pointing to http://purl.org/dc/dcmitype/Dataset or https://schema.org/Dataset). For example:

"links": [ ..., { "href": "http://purl.org/dc/dcmitype/Dataset", "rel": "type", "title": "This collection is a dataset." } ]

Since the collections are hierarchical, I assume the following statements are all true, if C is a hierarchical collection with sub-collections C1 and C2:

Correct?

jerstlouis commented 2 years ago

@cportele Many thanks for engaging on this, I still hold this topic dear :)

Correct?

Conceptually, yes, I think this is correct. A use case for this may be e.g., feature types, as we discussed in T17 / FG-JSON, with top-level collection including multiple feature types, and sub-collections only including one feature type.

However, I think implementation should be allowed to support different access mechanisms (i.e., different OGC API specs) at different levels of the hierarchies. e.g., whether to provide /items or /tiles at the upper and/or lower levels.

This would allow collections that are only organizing the leafs, or only providing multi-layer vector tiles in the top-collections, etc. That would simply be done by including or not certain links in the collection object.

As Gobe has pointed out, we could use up to reference the parent collection using a link.

This approach might be fine for /collections/{collectionId}, but my main concern is for organizing in a hierarchical manner a list of collections at /collections (e.g., presenting it in a tree view control), without having to individually retrieve every collection.

Repeating the title of the parent in this case (which would already be in the parent in the same array, for the list of collections) seems overkill too.

When retrieving the list of collections, the client already have those multiple objects in memory (within the array) from the collections list resource, so I think whether links should be used or not to establish hierarchical relationships within those objects of the array is debatable.

Particularly from a client's perspective (and perhaps a less "webby" client perspective), it's much more complicated to look through links and look for a particular relation type, and parse a URL, than to simply include a property that directly uses the collection ID (rather than URL, which might be relative).

"links": [ ..., { "href": "../the_parent", "rel": "up", "title": "..." } ]

vs.

"parent": "the_parent"

That being said, I would much prefer agreeing to a best web practice that enables hierarchical collections than not agreeing on how to define hierarchical collections.

And we could use type to identify resources that are datasets

That particular approach also seems a bit complicated to me from a client implementer perspective (instead of simply having an "isDataSet": true property), but again I prefer a best web practice I dislike to not reaching an agreement.

cportele commented 2 years ago

@jerstlouis

Conceptually, yes, I think this is correct.

OK, so that would need to be made clear in the spec for this.

I do not have an issue with using different API building blocks for different collections in a hierarchy. But if an API supports, e.g., features or vector tiles for all (sub-)collections, then the collections would have meet the constraints.

Particularly from a client's perspective (and perhaps a less "webby" client perspective), it's much more complicated to look through links and look for a particular relation type, and parse a URL, than to simply include a property that directly uses the collection ID (rather than URL, which might be relative).

Yes, I see that point. Maybe it would be good to collect implementation feedback and test it in a few code sprints. (If we end up with OGC-specific conventions we can still support an option in our implementation to represent the links in API deployments that prefer to leverage Web linking.)

jerstlouis commented 2 years ago

Thanks @cportele .

But if an API supports, e.g., features or vector tiles for all (sub-)collections, then the collections would have meet the constraints.

If what you mean is that both the parent collection and its sub-collections e.g., all support Features, then yes I agree.

test it in a few code sprints.

We did some initial testing in past code sprints with pygeoapi server implementation in the past, but perhaps we could now test this updated approach?

@tomkralidis will you be participating in the Tiles / Coverages / DGGS / EDR "Space Partitioning" Code Sprint next week?

tomkralidis commented 2 years ago

@jerstlouis yes I will be participating with a lense on OACov and EDR.

jerstlouis commented 2 years ago

Great to hear @tomkralidis . If you're interested we could discuss Hierarchical Collections and do some TIEs with our client in the context of OGC API - Coverages to evaluate the approach(es) described above and provide feedback.

pvretano commented 2 years ago

Are we still proposing to use ":" or some other non-slash character as the collection seperator?

To me, this is not a hierachy: https://maps.ecere.com/ogcapi/collections/NaturalEarth:physical:bathymetry/map this is a hierarchy https://maps.ecere.com/ogcapi/collections/NaturalEarth/physical/bathymetry/map.

The trick is to figure out what the path elements between collections and map mean and what you get if you do a GET on an intermediate path llke https://maps.ecere.com/ogcapi/collections/NaturalEarth/physical.

My thinking goes something like this:

1) GET /collections always gets you the flat list of collections as it always has so clients that don't know about hierarchies can continue to work as always. 2) GET /collections?hierarchy=true (or something like that) gets you the list of collections but organized in a hierarchy. This would mean extending the current collections schema but I think this can be done is a backwards compatible way. 3) Getting a sub-collections like GET /collections/NaturalEarth gets you a JSON (or other format) document describing what the NatrualEarth sub-collection is all about (including what sub-sub-collections are part of the NaturalEarth sub-collection) and also includes navigation links to the children collections or the current sub-collection. I assume this JSON (or other format) document would be the same one you get with GET /collections?hierarchy=true anchored at the current sub-collection rather than /collections. 4) Among the things that the sub-collection document can include are links to well known OGC endpoints like maps and items with the appropriate rels (e.g. items, etc.). If such links exists it means that you can get a map or features or coverage or whatever of the sub-collection and all its children collections. So GET /collections/NaturalEarth/map gets you a map with all the children collections rendered. This could be inefficient so a service may choose to simply provide navigation to the children collections without the ability to render the sub-collection as a map (... or feature, or coverage, etc.). Of course, eventually you will reach a node like .../collections/NaturalEarth/physical/bathymetry that would include links to OGC endpoints like items or map or coverage or whatever and then you could access the resource as the endpoint dictates (i.e. as a map, as features, as a coverage, etc.) 5) As @cportele pointed out at each level in the hierarchy links (rel=up) are included to connect the nodes.

I really dislike the colon notation that is being proposed because it means that clients would need to parse the collection id which always makes my "Spidey sense" tingle! Of course, I have not described all the details here but perhaps we can put this approach on the agenda for next week's code sprint to see if it has legs.

jerstlouis commented 2 years ago

@pvretano

Are we still proposing to use ":" or some other non-slash character as the collection seperator?

I really dislike the colon notation that is being proposed because it means that clients would need to parse the collection id which always makes my "Spidey sense" tingle!

I agreed with you and @cportele that this tingles the Spidey sense and moved away from relying on a particular separator. Instead, what I proposed is a simple "parent" : "{parentCollectionId}" to be included in each collection object. e.g., collection NaturalEarth:physical:bathymetry:ne_10m_bathymetry_I_2000 would have its parent set to NaturalEarth:physical:bathymetry, but the collections could be named Foo and Bar instead.

This allows to easily and unambiguously establish hierarchical relations between collections when requesting all collections at /collections, and present them all e.g., in a tree view control, with a single server round-trip.

Including a parent property there makes the flat list a hierarchy for clients that understand it, without requiring a separate ?hierarchy=true mode, while being fully compatible with clients that simply ignore it.

A server could use whichever convention for hierarchy separators, or no particular separator. In the past, when we used / instead of :, that didn't seem to break any clients, so possibly / could be used, but I think it is less proper for the OpenAPI descriptions for {collectionId} to include slashes (and we don't want to break compatibility with clients that do not understand the hierarchical collections extension).

If it is decided that this is done with a rel=up link instead of a parent property, that works as well, but is heavier in the array of collection objects.

The ?parent= query parameter in turn would make it possible to only retrieve immediate children. A mechanism to specify the top level parent would be needed, which could be ?parent= with nothing following the =, or something else, so that a client can request only the top-level collections without including the full hierarchy.

The inclusion of collections list property for children collections in the parent collection resource (e.g., /collections/NaturalEarth) is what we currently do (e.g. see https://maps.ecere.com/ogcapi/collections/NaturalEarth?f=json).

The equivalent for listing the sub-collections in this new proposal would be /collections?parent=NaturalEarth instead, but we could also specify that any non-leaf collection should include the list of immediate children in a collections property.

pvretano commented 2 years ago

@jerstlouis thanks ... lets discuss at the code sprint next week. Looks like we have lots of source material to consider which is a good thing.

jerstlouis commented 2 years ago

@pvretano off-topic, but I also hope we can discuss Common building blocks related to the Features Search extension that we proposed for Coverages and DGGS ( https://github.com/opengeospatial/ogcapi-coverages/issues/164 ). Glad to hear you will be participating in this Code Sprint! :) This is what we will be focusing on.

jerstlouis commented 9 months ago

At the OGC API - Common session of the 127th Members Meeting in Singapore we briefly discussed this topic and there was no outspoken objection to draft and review an optional "Hierarchical collections" requirements class for Part 2 adding which would:

This would also replace capabilities that were specifically included in the 3D GeoVolumes spec ( https://github.com/opengeospatial/ogcapi-3d-geovolumes/issues/5 and https://github.com/opengeospatial/ogcapi-3d-geovolumes/issues/12 ).

m-mohr commented 1 month ago

Has this any implementations yet or other standards using it? If yes, which and where? If not, it feels wrong to define something in "Common" that is not common yet. :-)

jerstlouis commented 1 month ago

@m-mohr this is the on-going Common discussion.

There is a plan to use it at least together with OGC API - 3D GeoVolumes (see https://github.com/opengeospatial/ogcapi-3d-geovolumes/issues/5), OGC API - Coverages, OGC API - Maps.

But the fact is that it is something that deals with resource paths (/collections and /collections/{collectionId}) that are defined in Common - Part 2, whereas those other standards simply references Common - Part 2 and really are not affected at all whether this is implemented or not (except that their shared, common use case of hierarchical collections has a well-defined Common solution that can be used).

At least in this case, it feels wrong to me to define it anywhere else than in Common. Whether the discussions take place in the 3D GeoVolume SWG, Coverages, Maps (technically we swapped our Thursdays 11:00 AM EST Maps meeting time slot to make room for Common discussions), it seems much more inclusive to me to have these discussions in the Common SWG, and it saves the time in those other SWGs to discuss things that are more specifically related to that SWG topics and not of interest to members of other SWGs. It's much easier to plan schedules for attending one Common meeting / week, than trying to attend every OGC API SWG meeting every week where a Common topic of interest might be discussed at different times.

The result of the discussion today were quite positive, and we have a simple way forward addressing the uses cases:

This can automatically be used (or not) together with any OGC API standards using Common - Part 2 collections.

It also plays nicely with OGC API - Records and related Common requirement classes (Searchable collections and Filtering collections with CQL2).

We plan on updating our implementation to what we agree to today on the call, which should be reflected in the draft hopefully by tomorrow for everyone to review.

jerstlouis commented 1 month ago

As per https://github.com/opengeospatial/ogcapi-common/issues/11#issuecomment-474915175 , we could also consider defining an optional isDataset boolean property of the collection description response to indicate that a particular level of the hierarchy (corresponding to a particular collection) is considered a dataset. An implementation / deployment can decide on the meaning of what a dataset actually means for them, as I don't think this is universally agreed :)

I would suggest to allow a dataset being inside of another dataset.

m-mohr commented 1 month ago

How could I get all top-level parents so that I can show a hierarchy in the client? Is that the default? But if a client doesn't support this parameter, how would it then get all collections?

Would the parent parameter include only the collection with that specific parent id or recursively everything underneath?

PS: Your email that you sent on 16:58 CEST for the meeting on 17:00 CEST was delivered to me by the OGC mail servers at around 19:xx CEST. Otherwise, I'd have joined, but sometimes the OGC mail servers seem to have quite a delay.

jerstlouis commented 1 month ago

PS: Your email that you sent on 16:58 CEST for the meeting on 17:00 CEST was delivered to me by the OGC mail servers at around 19:xx CEST. Otherwise, I'd have joined, but sometimes the OGC mail servers seem to have quite a delay.

Yes, I notice that. Sorry for the late notice. Common meets every week at 11:00 AM Eastern on Thursdays until we finalize Part 2. We will try harder to send a reminder the day before with the topic of the week. Next week we will probably review Hierarchical Collections again, and populate the other new req. classes (Schemas taken from Features Part 5, Filtering collections by CQL2 and Sorting based on Records).

If you read the newly generated draft at https://docs.ogc.org/DRAFTS/20-024.html#rc-hierarchical-collections , and if I did an okay job, the answers to these 4 questions you're asking should be crystal clear:

How could I get all top-level parents so that I can show a hierarchy in the client?

You request /collections?parent=none. (Requirement 27 C)

Is that the default?

It is not the default, but there is a permission for it to be the default specifically in an HTML representation, which should not break existing programmatic clients. (Permission 6)

But if a client doesn't support this parameter, how would it then get all collections?

A client that doesn't understand / care about Hierarchical Collections would work just as usual, because except for the HTML permission, all collections would be returned by default.

Would the parent parameter include only the collection with that specific parent id or recursively everything underneath?

When specifying parent=, only the immediate children of the collection with that exact id gets returned. (Requirement 27 B)

The purpose of the ancestor= parameter is exactly to retrieve the hierarchy of all collections recursively underneath. (Requirement 28B)

m-mohr commented 1 month ago

That sounds reasonable, I just think the parent=none is not ideal. parent and ancestor generally do the similar things. What happens if both are provided?

I'd like to propose a slightly different alternative (names: tbd):

This way you are more flexible, avoid conflicts and for me it's more consistent with the behaviour that happens without this conformance class:

Thoughts?

jerstlouis commented 1 month ago

Thanks for the feedback @m-mohr .

What happens if both are provided?

Probably should clarify that they are mutually exclusive and the server SHALL return a 400 error. It makes no sense to provide both.

parent (string): parent id. Empty or not provided (default) => collections with no parent

The way I initially read that I thought you meant that ? returns only collections with no parent (which could not work for backward compatibility), but based on Default is all collection, i.e. ? or ?parent=&levels=all that doesn't seem to be what you meant.

Using something like levels (parent-depth maybe?) to distinguish between the parent / ancestor behavior (and things in between) is a sound idea, however I'm a bit concerned about:

Curious what @pvretano and @kalxas think of this alternate parent / parent-depth approach. I'm open to the idea.

m-mohr commented 1 month ago

Yeah, default would return all collections.

The name change to parent-depth makes sense to me.

Not sure how much complexity it adds to count the levels? I feel it's not much more difficult compared to getting all colelctions recursively (which is already quite a task).

For me personally empty string feels more intuitive than none - there could also be a collection "NONE", people come up with all kinds of acronyms.

jerstlouis commented 1 month ago

Not sure how much complexity it adds to count the levels? I feel it's not much more difficult compared to getting all colelctions recursively (which is already quite a task).

Specifically, it means keeping track of the current depth and comparing that. It's an extra parameter if using recursion. As I said, it's a small amount, but it is additional complexity ;)

there could also be a collection "NONE", people come up with all kinds of acronyms.

Specifically prohibiting this in Requirement 26C which would apply if you conform to Hierarchical Collections.

For me personally empty string feels more intuitive than none -

With the parent / parent-depth approach, I would avoid the parent= or parent=none altogether, and use simply /collections?parent-depth=0. I would probably suggest something like this if going with this approach:

This would imply a default value of 0 for parent-depth when parent is used, but parent-depth not being applicable when neither itself nor parent is used (so that the default /collections query still returns the entire hierarchy).

m-mohr commented 1 month ago

Specifically prohibiting this in Requirement 26C which would apply if you conform to Hierarchical Collections.

Yeah, but many people have existing IDs and don't start from scratch. Renaming a collection and breaking users workflows because of such a requirement seems like a bad idea to me.

With the parent / parent-depth approach, I would avoid the parent= or parent=none altogether,

Yeah, that's what I meant above but probably explained in a confusing way.

m-mohr commented 1 month ago

We concluded in the session today:

Parameters:

Examples:

jerstlouis commented 2 weeks ago

The agreed upon changes have been applied in https://github.com/opengeospatial/ogcapi-common/commit/faca4aaf349b9fdbd87b9463d62d138f86eb85ff .

At @joanma747 's suggestion, we used descendants=immediate rather than descendants=children because of the clearer meaning, since we use parent also with the meaning of ancestor it could be argued similarly that children also refer to descendants.