json-schema-org / referencing

Proposals for a possible specification encompassing the varying uses of "$ref"
MIT License
6 stars 1 forks source link

JRef Vision, Commentary, and Expanations #7

Open jdesrosiers opened 2 years ago

jdesrosiers commented 2 years ago

The goal of the JRef specification is to come up with a minimal unified solution that works the same in any context and is independent of any specific domain. It should be a complete component rather than something that needs to be extended or configured to be used. It should be compatible with all the needs of the community, but does not need to support every solution currently in use. For example, a generic solution to bundling is a need that must be supported, but specific solutions to bundling, such as the JSON Schema's bundling process and libraries that inline references, don't need to be supported in this specification. However, the specification should enable specific solutions such as JSON Schema style bundling to extend this specification or use it as a component.

JRef is defined as a media type, uses URIs, and expects implementations to work like a generic hypermedia client (browser) abstracting away linking mechanisms from users. It's common in OpenAPI and AsyncAPI circles for implementers to use tools that remove references from a document and just work with plain JSON. That can get complex and is not always possible. Using a JRef library to traverse the data is much simpler and always works.

Adopting JRef would mean defining that our specifications must be written in a JRef compatible format rather than with a JSON compatible format. Then we can remove all referencing related stuff from our specs and treat everything like plain JSON. References would be allowed anywhere and can reference anything. This would make implementation much easier because the JRef library abstracts away anything reference related.

Specifications can extend JRef to add additional features, but that kind of thing should be rare. Ideally, developers should be able to use an off-the-shelf JRef library without customization, configuration, or plugins just the way they would use an off-the-shelf JSON Pointer library. Implementers shouldn't have to customize or re-implement JRef for each specification that uses it. They should be able to use a generic JRef library.

The following is some rationale for why I made some of the decisions I made for this specification. Let me know if there are any other decisions you'd like me to justify as well.

Why not $ref

JSON Schema references only reference JSON Schemas. JSON Reference draft-03 references only reference JSON. JRef references are more like links in an HTML document. They can reference anything and their nature is defined by the resource that is referenced, not the resource doing the referencing. The name $href (hypermedia reference) makes it more clear not only that this is a slightly different concept than $ref, but also that it works similarly to a web-style link.

Although I stand by that previous paragraph, the real reason for the name change is to disambiguate from JSON Schema's version $ref. JSON Schema's version of $ref has evolved into something that is tightly coupled to JSON Schema and adopting this universal version is not likely to happen. Choosing a different keyword name would allow JSON Schema to continue to use their version of $ref. It also leaves the door open for adopting this spec at some time in the future if opinions move that direction.

Why no $id

In JSON Schema, $id at the root identifies the document. To align with typical media types, this specification drops that concept and delegates identification to the retrieval URI. Most commonly, the retrieval URI will be a file:// URI representing a path on the local file system. I considered adding a keyword $base that works like the <base/> HTML tag. It doesn't identify the document, it just sets the base URI. I decided to leave it out until it's clear there's demand for such a feature.

Why no $anchor

Anchors are a rarely used feature in JSON Schema that doesn't provide any functionality you can't get with JSON Pointers. Because they also have some performance implications, I thought it was best not to impose this feature on implementations. Implementations such as JSON Schema can always extend this one to add that feature. References to anchors require scanning the document for the location of a matching $anchor. The performance implications of that scan can be mitigated by collecting anchor locations while parsing a JRef document, but that only helps when you're parsing from a string and have access to a parser with the right feature set.

Why no embedding/bundling

In JSON Schema, $id in a sub-schema indicates an embedded schema. It's supposed to be roughly equivalent to inlining a reference. This works in JSON Schema because you can only reference a schema, which is an object. In JRef, you can reference a value of any JSON type. Therefore, a different approach is necessary for a generic media type.

I'm not a big fan of the idea of bundling all references into a single document. I don't think there's any real benefit to it. In cases were bundling is desirable, a tar archive of all the necessary documents is all you need. A tar archive includes the path (retrieval URI) as well as the file, which is all you need.

Personally, I'd rather drop the whole concept of embedded documents in favor of just using a tar archive, but I have thoughts about an alternative approach I can share if people really think it's important to include.

In a previous version of JRef, I introduced a $embedded keyword that has similarities with JSON Schema's embedding schemas with $id. I dropped that keyword from this proposal for all the reasons I gave in this section. Embedding isn't necessary and even if it was, this solution is insufficient because it only works with objects.

Implications for AsyncAPI

AsyncAPI is in the best position because their $ref is fully compatible with $href.

A unique need AsyncAPI has is that they need to reference documents that are not JSON. This specification makes that possible, but it gets a little awkward when you need to reference a document that does not have a registered media type. In these cases, I suggest defining an unofficial media type to fill the gaps. You can assign a media type identifier, file extensions, and URI fragment semantics.

Implications for OpenAPI

OpenAPI allows certain keywords to be alongside a $ref and be merged with the result of the reference. This would not be possible with a JRef reference. There are several ways this can be addressed, but I think the best way is probably to change the structure somehow such that the merge isn't necessary.

Implications for JSON Schema

It may not make sense for JSON Schema to adopt JRef at all. JRef only really defines $href and I think JSON Schema is likely to want to stick with their JSON Schema specific version of $ref. Therefore, there's not much overlap between the JSON Schema spec and the JRef spec. It's also possible for JSON Schema to adopt and extend JRef adding $id, $anchor, and $ref. In this case, both $href and $ref would be supported.

jdesrosiers commented 2 years ago

[JRef] expects implementations to work like a generic hypermedia client (browser) abstracting away linking mechanisms from users. It's common in OpenAPI and AsyncAPI circles for implementers to use tools that remove references from a document and just work with plain JSON. That can get complex and is not always possible.

Although it's not always possible to inline all references into a single JSON document, it is possible to parse to an in-memory data structure. There may be cases where this would be desirable over the browser approach. You would loose lazy loading and information about where the value you are working with originated (which can be a pain for debugging), but if you don't need those things, parsing to an in-memory data structure is possible with this approach.