stac-extensions / sar

Covers synthetic-aperture radar data that represents a snapshot of the earth for a single date and time.
Apache License 2.0
13 stars 8 forks source link

Collection support for sar extension #5

Closed schwehr closed 2 years ago

schwehr commented 3 years ago

Hi all,

I'm working on the Earth Engine Sentinel-1 STAC Collection. Since the spec is only designed for Item's, it's not validating the fields that I'm putting in the Collection summaries. What do folks think about allowing the sar extension to support collections to allow the top level (in the case of earth engine, the only STAC thing) for SAR data?

Note that I'm in the process of fixing the S1 STAC Collection, so if you look now, you will find things that shouldn't be there like sar:bands

See:

m-mohr commented 3 years ago

That's not how it is meant to work. Just use the summaries and wrap the SAR fields either as arrays or ranges. That's how it's meant to work and you do it for other fields already?! What do you mean by "not validating"?

schwehr commented 3 years ago

I don't understand. Since https://github.com/stac-extensions/sar/blob/main/json-schema/schema.json is only for an Item, the pystac happily ignores fields that are messed up as it sees a collection, yes?

How am I supposed to do validation?

e.g.

#!/usr/bin/env python3
"""
virtualenv ve3
source ve3/bin/activate
pip install pystac
pip install jsonschema

validate LANDSAT_LT04_C01_T2.json LANDSAT_LT04_C01_T2_SR.json
"""

import sys
import pystac

def ValidateStacCollection(filename) -> None:
  pystac.set_stac_version('1.0.0-rc.2')
  col = pystac.Collection.from_file(filename)
  # Strip links that might break validation
  col.links = [l for l in col.links if l.rel not in (
      'parent', 'root', 'successor-version')]
  col.validate()

def main(filenames):
  for filename in filenames:
    print(f'validating "{filename}"')
    ValidateStacCollection(filename)

if __name__ == '__main__':
  main(sys.argv[1:])
m-mohr commented 3 years ago

The schemas are NOT only for items, they are also meant for collections, but we are working with limitation in JSON Schema (no contains in objects) and also in time to improve schemas. It's only me who's doing all the schema work in STAC (for all the spec and 20+ extensions) in literally around 3 hours per week max because we didn't find anyone yet who could do it. Would be great if Google could step in and help improve the situation. Until then, you have to live with limited validation in Collection summaries, but that doesn't make the extensions not apply to Collections. See also the written READMEs. But we can validate manually for S1 if required.

schwehr commented 3 years ago

Are you saying that it's possible or not possible to change the schema to support validating sar: in Collections? The schema as it stands is just for an Item. I was going to see about adding Collection support, but it sounds like you are saying that isn't possible, but I'm not totally sure that's what you mean.

m-mohr commented 3 years ago

There is Collection support, see line 57 and following. What is missing is detailed support for Collection summaries, which right now is only covered in the Collection schema itself so just checks the summary structure independent of the extension fields.

Anyway, it is possible to fix it, but it's an absurd amount of JSON Schema you'd need to write as there's a lot of possible combinations (3 different summary schemas per field) - ultimately, this should be somewhat autogenerated because writing them from hand could get mostly unmaintainable. And you'd need to use some weird JSON Schema dialect for "object contains", which makes it even more unreadable: https://github.com/json-schema-org/json-schema-spec/issues/1077#issuecomment-782926462