radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
794 stars 178 forks source link

stac_extensions and summaries #1077

Closed m-mohr closed 3 years ago

m-mohr commented 3 years ago

This originates from https://github.com/stac-extensions/projection/issues/3

In the STAC collection spec is says:

If a structure, such as the summaries extension, provides fields in their JSON structure, these extensions must not be listed here as they don't extend the Collection itself. For example, if a Collection includes the field sat:platform in the summaries, the Collection should not list the sat extension in the stac_extensions field.

This was added intentionally in 0.x, but might be outdated now. We added Collection scope to most extensions in rc.1/2 due to the fact that the fields can be used (and validated now) in collection assets (and item asset definitions in collection). A weak point is that we can't validate the summaries and couldn't use the schemas before for validating collections. With the newest changes to the schemas, we should be able to also add extensions to the stac_extensions array that are implemented in summaries (although no validation takes place). So I guess we should remove the wording above to make it more straightforward to implement?

Interestingly this was not ported over to Catalogs with the introduction of summaries there, so the wording is not there which makes it even more inconsistent.

emmanuelmathot commented 3 years ago

Maybe a scope for summary may be defined at field level because you don't always want to summarize some fields (e.g. file:checksum) . It would also give hints to developers that want to implement functions to generate large collection or catalogs to automatically select which field should be summarized or not.

m-mohr commented 3 years ago

@emmanuelmathot I don't understand that. Could you give an example, please?

emmanuelmathot commented 3 years ago

All fields from extensions for items are implicitly candidate for summaries in collection, right? So if you want to automate the summaries based on the item referenced (e.g. STAC API), how do you know which extension field is a valuable value for summary? I would propose to have a summary scope per field and the recommended summary type For instance a collection summary scope per field would set eo:cloud_cover -> yes, stats file:checksum -> no sar:product_type -> yes, value set sat:relative_orbit -> yes, stats With that scope, there is a proper reason to have stac_extensions declared in collection

m-mohr commented 3 years ago

Thanks, now I understand. This is basically what issue #1004 is about. This list only specifies whether something should be summarized or not, but we could also make a recommendation on stats or value sets. Although, in most cases that should be relatively clear from the data type: number -> stats string -> value set (if you want you can check for ISO timestamps and make them stats) array -> merged value set object, boolean -> value set