radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
789 stars 179 forks source link

Can items belong to multiple collections? #741

Closed beck3905 closed 4 years ago

beck3905 commented 4 years ago

Currently the item spec has the collection field which is a string. Could this be a list instead? It seems possible that an item could and should belong to multiple collections.

m-mohr commented 4 years ago

No, that's not possible at the moment. Indeed, there's theoretically the possibility to add two collection links, but the collection field doesn't allow a list of values. Just making it a list is not just that as it would/could lead to conflicts with the commons extension. What do you think @matthewhanson ?

cholmes commented 4 years ago

I'm inclined to say that there are others ways to model this with the current structure, and that it keeps clients a lot simpler if they don't have to check multiple collection links and try to interpret what that means. Like it seems to imply multiple inheritance potentially, which may be useful, but also is a lot more complicated.

I'd say if the desire is to represent the same 'thing' in multiple collections then you could just have two items that have a relation to each other, that says 'this item in another collection is the same as me'. Or if the important thing is for an item to link to two collection json files then you can have one be the collection field, and then add a 'link' to the other collection that should be used.

I could be convinced if there's really clear use cases that really require this, but if it's just a desire to have a single item included in multiple containers then I'd say just use links and propose an extension.

matthewhanson commented 4 years ago

The use of the 'Commons extension really makes this a major headache, as it means you would have to consider inheritance from multiple items.

However, we are discussing removing the Commons extension, which makes this much easier. But I agree with @cholmes I'm not sure if there's a real need for this.

m-mohr commented 4 years ago

If Commons goes away, I guess the only issue to make it more flexible is to make the collection property in the Item an array (or remove it altogether and use the link references somehow).

matthewhanson commented 4 years ago

A common query on Items is to find all Items belonging to a certain collection. This is more difficult if the server must parse the entire links array to find collections....and it's also not guaranteed that the collection link includes the collection ID - you'd have to follow the link to get the collection ID.

m-mohr commented 4 years ago

So Commons has been removed, but now there's the Item Asset extension. How do we proceed? Should we allow multiple collections in the collections field?

pomadchin commented 4 years ago

We also had a need in having items that belong to multiple collections. It ended up being a STAC Layer / Group Extension which adds a separate concept of 'layers': groups of items the belong to different collections. Each layer can also have a summary about all items that belong to it.

How much it makes sense to push this idea towards being an extension rather than being a change in the collections relationship?

To make it work, STAC API would only have to implement a query extension. However, to request layer 'summaries', it would be neccesary to implement STAC API Layer Extension.

pomadchin commented 4 years ago

During the STAC sprint we talked a bit about the Layer Extension and about items belonging to multiple collections; The conclusion was that probably we don't need it (both: layer extension and items with multiple collections), but what can be an option is the Aggregation extension, where the result of the aggregation can be represented as a collection.

/cc @matthewhanson @m-mohr

m-mohr commented 4 years ago

Conclusion: We discussed that in the Python room today and came to the conclusion that this is likely not needed. You can have multiple collections referring to a single item, but you always have a just a single collection you inherit from. If you want to do aggregations, we need to come up with API aggregation functionality and will work on that tomorrow. If you create new/dynamic collections on top of existing items, then you can't change the item anyway. So closing this.

JohnBTasker commented 4 years ago

@m-mohr What about in the example of scanned aerial photography, where each frame can be considered an Item? In most circumstances, this Item would need to inherit/link with two collections: the film it is from (inheriting details about film size, material, camera), and the project it belongs to (inheriting details about flying height, date, purpose). A film can have frames from multiple projects, while a project can have frames from multiple films.

I know historic scanned aerial frames aren't the primary dataset STAC has been developed to support, but being able to expose them for search/discovery purposes is still valuable.

m-mohr commented 4 years ago

It's totally the focus of STAC to also cater for those types of data and we are happy that you want to make them available. Nevertheless, inheritance complicates STAC and the simplicity is its beauty. What we have said providers earlier is to duplicate data in collections, depending on how their collections are meant to inherit. I think in your case, you'd just replicate the film collection metadata into the project collection. You can surely have multiple levels of collections, so film collections a level above the project collections and the project collections can have multiple parent film collections, but they don't inherit. That way you get a structure and people can navigate better through it, but you don't have the multi-inheritance complexity. The cost is some duplicated data, but the size increase should be minor. Doing that, you can have a single collection per item then. You can also have multiple collections pointing to an item, but a item can only point back to a single collection, which the item is "inheriting" from. Hope that's not too confusing? Happy to talk through it during the data sprint (Tuesday?)