Open jisantuc opened 3 years ago
Mostly all (published) extensions should already be self-contained. The only one I could think of now that actually has an external reference for a good reason is proj, which refers to the non-STAC schema for PROJJSON, I think. The issue with that schema is that it is circular and json-schema-ref-parser complains about that in the Node Validator already, so not sure whether it can bundle it. So I'm not sure whether it needs to be added to all repos or just proj for now?
Also, you can control what we do in these repos, but vendor extensions may have external references and you still need to be able to resolve them in tooling, so what's the point? ;-)
Some additional comments:
Gotta disagree with you about remote refs in card4l: https://github.com/stac-extensions/card4l/blob/main/sar/json-schema/product.json#L176
In general, all schemas should be freely available if the corresponding items/catalogs/collections are also freely available.
This doesn't help if someone is using an open source server to serve non-freely available data with non-freely available extensions.
you can control what we do in these repos, but vendor extensions may have external references and you still need to be able to resolve them in tooling, so what's the point?
The point is to model a "correct" way of doing things in repos maintained by "the STAC community." Those are both pretty vague concepts, but pointing people in good directions by default with the official template is better than not pointing them in good directions.
Okay, I understood remote as not part of the same spec/extension. We also have remote references in the item, catalog and collection-spec schemas then.
This doesn't help if someone is using an open source server to serve non-freely available data with non-freely available extensions.
This was meant to say: Schemas should have the same "scope" as the data, e.g. free schema <=> free data. schema only available in intranet <=> data only available in intranet, etc.
I understood remote as not part of the same spec/extension. We also have remote references in the item, catalog and collection-spec schemas then.
Yes, the remote refs in item and catalog were a part of the tiled-assets example in the latency problem I talked about in the issue text.
Schemas should have the same "scope" as the data, e.g. free schema <=> free data. schema only available in intranet <=> data only available in intranet, etc.
This doesn't help the server implementation problem. In particular, if we want to provide an off-the-shelf/no code STAC server (which is Franklin's goal, and which I think is a pretty reasonable goal for a data specification targeting people who largely aren't web developers), the data and schemata being private doesn't help a user tell their Franklin deployment how to access them. If published schemata had to be self-contained, they could be read from a special location in the container image without needing to rewrite refs.
I'm fine with bundling, but I think we should start to get that into the core spec and then port that over to the extensions. json-schema-ref-parser seems to be the right tool for it, which we can easily integrate into the CI workflows.
Currently the extension schemata are a mix of self-contained files (like file and label) and schema requiring arbitrary URI resolution (like tiled assets and card4l). If we use remote references in the published schemata, we expose ourselves to two kinds of risk:
tiled-assets
references the item schema, which references remote schemata for geojson features (by url), basics, datetime, instrument, licensing, and provider (by relative path), and the catalog schema, which references the catalog-core schema. So to take one JSON item and validate it against thetiled-assets
extension (the first time -- obviously these things can be cached), I have to make ten http requests.Additionally, there are varying degrees of JSON schema remote $ref support in common languages used for STAC:
circe-json-schema
(Scala) desires to read refs as file pathsThe cost of doing away with remote refs everywhere is duplication and no more inheritance. That's a pretty hefty cost, which is why I'm only proposing that published schemata be self-contained. In particular:
The benefits of inlining will be that any language with a tool that can load a JSON schema from JSON will be equally supported for STAC tooling work, and servers won't have to do as much work the first time they see a schema URL.