stac-extensions / stac-extensions.github.io

Overview of STAC Extensions, with advice on creating new extensions
https://stac-extensions.github.io
Apache License 2.0
14 stars 9 forks source link

Remove constraint to have at least one field specified in extension #20

Closed jjrom closed 3 years ago

jjrom commented 3 years ago

Note: This issue was raised in the sat extension but can be be applied to all extensions (see https://github.com/stac-extensions/sat/issues/2)

All extensions required that "at least one of the extension field must be specified" when the extension id is defined in the "stac_extensions" property of an item/collection.

The reason of the requirement is unclear. Every field of an extension could be optional. Operationally, this would not break anything. A server could indicates that it supports the extension "abcd" through the "stac_extensions" property which mean that potentially the resulting items can contain properties of that extension. An item without any of the property define in extension "abcd" would also be valid.

Conversely, the requirement to have at least "one field of a defined extension to be present" makes impossible to have collection with partially complete metadata.

For instance, a service could provide users to post their own metadata to a STAC server. To ease the import, the service provides collection templates to store metadata including a template to store satellite imagery metadata. This template conforms to "sat" extension.

The users' metadata are valid spatiotemporal metadata but without any of the satellite extension metadata (this is not a false use case - I worked on archived data that barely have nothing else than a time of acquisition and a footprint). Since the data they defined were acquired by a satellite. It's perfectly logical that they belong to a "satellite collection". However with the current requirement, we must disallow these metadata to belongs to a "satellite collection". Belonging to a "satellite collection" reflects in itself a useful information on the provenance of the metadata despite the presence or not of specific satellite metadata.

I would suggest to remove this constraint

emmanuelmathot commented 3 years ago

1/ The conformsTo declaration is different from the stac_extensions in items or collections. The former does not imply the latter. I understand that the problem is the requirements at STAC spec level and not API. 2/ The reason of this requirement is the following: If the item or the collection do not have any of the extensions properties (fields or construct) then the declaration in stac_extensions of the items must not be present. The extension does not define the nature of the collection or the item, only potential fields available in the document. If a collection or item 's nature of data is needed, then having a dedicated extension is probably the best option. 3/ Declaring extensions without properties would lead to documents with all extensions systematically to cover all cases. As an implementer, I think software implementation must be able rely on the stac_extensions declarations to load adequate modules.

jjrom commented 3 years ago

1/ Sorry I meant "stac_extensions" not "conformsTo". I should not write an issue on Monday morning :)

2/ I completely agree on the item's nature. But from my point of view, the stac_extensions contains an intrinsic information on the item nature. Adding an additional field would duplicate information

3/ Of course the software implementation rely on the "stac_extensions" declarations to load adequate modules. This is perfectly compatible with an extension that declared all the properties optional. So I don't see the problem here.

m-mohr commented 3 years ago

I agree with @emmanuelmathot and already posted three comments (https://github.com/stac-extensions/sat/issues/2#issuecomment-830101111, https://github.com/stac-extensions/sat/issues/2#issuecomment-830181270, https://github.com/stac-extensions/sat/issues/2#issuecomment-830186297) in the original issue. I think they are still valid and for the reasons posted there, I'm voting against making the schemas and stac_extensions more liberal (and IMHO less useful for validation) as proposed here.

jbants commented 3 years ago

Hey @jjrom, The PSC feels that this should not be included in the spec for the issues that @m-mohr and @emmanuelmathot brought up. Thanks for raising the issue.

jjrom commented 3 years ago

Hey @jbants...hum I would have liked that a larger audience bring arguments for or against this issue before closing it so quickly. Personally I'm not convince by the arguments currently brought against this proposal.

cholmes commented 3 years ago

And for a bit more on the reasoning - the core reason for including the schemas is to actually validate the fields. I think your use cases are quite interesting, but are not the intent of the stac_extensions field - it's to provide the information to validate fields, and therefor we shouldn't water that down.

The 'posting data' use case has come up before, @jisantuc has a similar flow in Franklin, where users can upload STAC data. In their implementation users can upload data that conforms to the STAC core but has invalid extension information. A user can work with that data on the server, even if it is 'invalid'. PySTAC similarly lets users work with STAC data that doesn't fully validate on all its extensions. So it's fine to have an upload template that doesn't validate. But if the data is published to follow the spec then it should only declare its extensions if it actually has some data in the extension.

I think the idea of defining data as acquired by a satellite is interesting. But it's not how extension is defined - it is defined to make available particular fields, not to say 'this was acquired by a satellite'. So the meaning of including the extension doesn't actually say 'this is acquired by a satellite', and other data types don't say that. Indeed I think the logic breaks down with other extensions - what does the presence of the projection extension mean? That the data is stored in an alternate projection, but not communicated? Or the view extension, what does it mean to include it but not have any fields?

I agree that satellite, pointcloud, and sar do have an implied meaning. But I think in that case we should make it explicit. We could have a required field in satellite that says 'capture = satellite' or 'platform_type = satellite' something like that (and platform_type could be aerial, drone, etc). And instrument_type for sar vs eo.

jjrom commented 3 years ago

@cholmes Thanks for this clear explanation and for bringing more context to the decision to close the issue. That makes sense and I understand that the stac_extension should only be used for fields validation without any implicite meaning associated to it. I give up :)

cholmes commented 3 years ago

@jjrom - cool, yeah, apologies for us deciding without communicating the context - I meant to follow up faster.

And I do think we should do something for what you brought up - to say 'this is satellite imagery'. I'll raise an issue for it so it doesn't get lost. So don't give up entirely! :)

jbants commented 3 years ago

@jjrom That's on me. Sorry for closing it without context.

jjrom commented 3 years ago

@jbants No worries at all !