radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
794 stars 178 forks source link

JSON Schema in summaries #1045 #1093

Closed m-mohr closed 3 years ago

m-mohr commented 3 years ago

Related Issue(s): json-schema-org/json-schema-spec#1045

Proposed Changes:

  1. Support JSON Schema in summaries - I hadn't had a lot of time to flesh out the language, improvements appreciated.
  2. Rename Stats Object to Range Object
  3. Fix collection examples (add missing stac_extensions entries), but that fails right now due to https://github.com/stac-extensions/scientific/pull/2

PR Checklist:

m-mohr commented 3 years ago

@jjrom Could you please have a look at this PR? I had to derive a bit from our discussion (see the issue for some context). I'm happy to discuss this again in a short call, too.

m-mohr commented 3 years ago

@jjrom Do you have an example at hand that fits into the examples folder? Could be an old one that we convert to the new schema. We don't want too many/long examples in markdown to limit the length of the documents, so I'd put them somewhere in the examples folder and link to it, but the existing examples I also don't really want to touch, because I don't have a clue what values for counts etc to put in.

jjrom commented 3 years ago

Perhaps we could duplicate the "collection.json" example (https://github.com/radiantearth/stac-spec/blob/master/examples/collection.json) to a "collection-extended-summaries.json" example in which we replace

"platform": [
      "cool_sat2",
      "cool_sat1"
]

with

"platform": {
      "type":"string",
      "oneOf":[
          {
              "const":"cool_sat1",
              "count": 103489
           },
          {
              "const":"cool_sat2",
              "count": 50700
          }
      ]
}

?

m-mohr commented 3 years ago

@jjrom I think we can only add it to the collection-only folder right now. The others are a full catalog, so the count would likely be 1 or 2 as there are not so many items and we also can't add two collections, because then the items would need to link to two of them.

Maybe we could duplicate, shorten and then convert https://github.com/radiantearth/stac-spec/blob/master/examples/collection-only/collection.json a bit and also add titles to epsg codes as you mentioned in the call?

jjrom commented 3 years ago

@m-mohr Ok that makes sense. In this case we can duplicate https://github.com/radiantearth/stac-spec/blob/master/examples/collection-only/collection.json and update the platform property. Concerning the EPSG codes, I'm fine to add titles but only if we shorten massively the property content - for instance limits it to two EPSG codes (e.g. EPSG:4326 and EPSG:3857). Otherwise it would be hard to read

m-mohr commented 3 years ago

Yes, agreed! Or can you export your "best" example from your implementation, which we then try to convert to the new format? Would be a good verification that the new schema works well with your data.

jjrom commented 3 years ago

I plan to update my implementation to fit the new schema next week. If it's not too late I can provide a real example by the end of next week. Otherwise we stick to the existing collection.json duplication/shortening as we discussed (?)

m-mohr commented 3 years ago

@jjrom I thought about an example from your current implementation. That should be easy enough to migrate "manually".

cholmes commented 3 years ago

+1 on 'manual migration'. I have some time today and can try to help.

m-mohr commented 3 years ago

@schwehr This looks very much like it could be useful for you as you have this gee:schema thing in your catalogs, right? Any thought on the PR? Could this be useful for you, too?

schwehr commented 3 years ago

Some initial thoughts while reading to catch up... I'm not familiar enough with jsonschema. And I haven't had a chance to talk it over with simonf.

  1. Is there a way to specify the units?
  2. Looking at the example, it would be nice to have simple (even if contrived) examples of all of them (as much as jsonschema can support).
  3. At least one example without a constrained set of values for string and int.
  4. An example that used more of the capabilities of json-schema would be good
  5. We expected to have types of INT, DOUBLE, STRING, INT_LIST, DOUBLE_LIST, or STRING_LIST. I looks like we are not using the DOUBLE_LIST. The proto enum:
    enum PropertyType {
    PROPERTY_TYPE_UNSPECIFIED = 0;  // Something is wrong.
    STRING = 1;
    INT = 2;
    DOUBLE = 3;
    STRING_LIST = 4;
    INT_LIST = 5;
    DOUBLE_LIST = 6;
    }
m-mohr commented 3 years ago

The CI fails due to https://github.com/stac-extensions/scientific/pull/2. Once the PR is merged, we need to release scientific as 1.0.1 and change the schema URLs here / in stac-spec.

m-mohr commented 3 years ago

I'm wondering whether we should mark this new feature 'experimental' (with the chance of breaking in a minor release)?

schwehr commented 3 years ago

Put in a suggestion for units: https://github.com/json-schema-org/json-schema-vocabularies/issues/46

cholmes commented 3 years ago

I'm wondering whether we should mark this new feature 'experimental' (with the chance of breaking in a minor release)?

Is there a way to make the feature an extension? Happy for it to not be namespaced and even mentioned in the main spec. But that could let us evolve it without having to cut stac-spec releases.

m-mohr commented 3 years ago

No, it's not really extensible, I'd say. At least the extension would have the potential to break implementations and thus it's not a good extension.

m-mohr commented 3 years ago

Thanks, @schwehr.

  1. Is there a way to specify the units?

Not standardized, but custom keywords are a thing.

  1. Looking at the example, it would be nice to have simple (even if contrived) examples of all of them (as much as jsonschema can support).

Not sure what exactly you are asking for?

  1. At least one example without a constrained set of values for string and int.

We encourage to specify what values to expect to make it more useful for users. Just specifying that the field exists (with a specific data type?) would not be a best practice IMHO. Ideally, there would be more information.

  1. An example that used more of the capabilities of json-schema would be good

I'm not sure STAC is the right place to go into JSON Schema details. Wouldn't it be better to just link to https://json-schema.org/learn/ ?

  1. We expected to have types of INT, DOUBLE, STRING, INT_LIST, DOUBLE_LIST, or STRING_LIST. I looks like we are not using the DOUBLE_LIST.

There's no direct type value for typed arrays in JSON Schema, instead you do something like: { type: 'array', items: { type: 'integer' } } (INT_LIST) or { type: 'array', items: { type: 'string' } } (STRING_LIST) The other types would be integer (INT), float (DOUBLE) and string (STRING).

Put in a suggestion for units: json-schema-org/json-schema-vocabularies#46

JSON Schema allows for custom keywords, so I'm not sure there's a need for another annotation in a validation language as you can simply define it on your own. But let's see what they think.

schwehr commented 3 years ago

Is a custom type available within the actual schema being defined inside the STAC summary? In the case I'm talking about, the data isn't what's going to have the units. The schema says what the units are going to be.

  1. Looking at the example, it would be nice to have simple (even if contrived) examples of all of them (as much as jsonschema can support).

Not sure what exactly you are asking for?

There is little sense of the flexibility of the possibilities here. I agree that the reader should go look at the json schema spec, but the examples here are extremely narrow.

  1. At least one example without a constrained set of values for string and int.

We encourage to specify what values to expect to make it more useful for users. Just specifying that the field exists (with a specific data type?) would not be a best practice IMHO. Ideally, there would be more information.

Examples of non-categorical data:

m-mohr commented 3 years ago

This seems good for now. We can fine-tune docs and examples later.