nickel-lang / json-schema-to-nickel

Convert JSON schemas into Nickel contracts
Apache License 2.0
29 stars 0 forks source link

Extended local ref support #72

Closed yannham closed 3 months ago

yannham commented 3 months ago

This pull request implements support local references to not only top-level definitions, but also properties, that is a JSON pointer of the form #/properties/foo/properties/bar/.../properties/qux.

Motivation

Currently, json-schema-to-nickel is only able to handle local references to top-level definitions, that is references of the form "#/definitions/xxx. Beside simply supporting more schemas, the motivation here is that [json-schema-ref-parser](https://github.com/APIDevTools/json-schema-ref-parser) can help users circumvent the fact that json-schema-to-nickel currently doesn't support external refs by bundling the external refs directly into the schemas. However, this tool most often produces references of the second form, that is#/properties/foo/properties/bar`, so it can't be used yet as a pre-processing step to make json-schema-to-nickel able to handle external references.

Design

Definitions aren't that special, so we extend the idea of definitions which is to introduce special variables (the "environment") handling the conversion of properties as contracts and predicates to be used from anywhere within the schema.

However, there are a few differences.

Code bloat

Definitions are usually, by nature, supposed to be used in the schema. Many schema don't have definitions at all, and those who do most of the time actually use them. Also, definitions should make for a small part of the final schema.

This is different for properties, which are the substance of most JSON schemas. If we blindly make them all available as contracts and predicates in the environment, we will thus potentially duplicate each property 3 times, whether it's actually useful or not: once as a contract in the final schema, and twice in the environment, as a predicate and a contract. It also means we have to hold those two versions (contracts & predicates) somewhere in memory throughout the conversion.

We assume that the happy path is that schemas don't have any local references (or few), and want to optimize for that case. Thus, it would be better to only introduce in the environment the properties that are actually referenced somewhere.

Contracts are already there

Definitions aren't part of the final schema per se, but are only in the environment. However, properties are always part of the final schema, at least their contract version. Thus, we don't need to duplicate the contract version in the environment: we can just use a simple and direct recursive access.

In consequence, we only include the predicate version of reference properties in the environment, to reduce further the code bloat of the generated contract.

Solution

The main idea is to pass additional state around during the conversion of the root schema. This state (RefsUsage) maintains 3 sets:

  1. The top-level definitions used as a predicate
  2. The top-level definitions used as a contract
  3. The properties used as a predicate

We can't rely on the From/TryFrom trait and introduce similar bespoke traits for the conversion from schemas to Predicate or Contract.

Then, at the end of the conversion, we build the environment from the usage registered during the conversion. Building correctly the definition part is iterative: indeed, when constructing the definitions, we convert new schemas that were unseen until now (the definitions), which can themselves reference new definitions. We thus iterate until we don't see any new definitions anymore.

Property predicates are stored in the environment as a flat record: a path #/properties/foo/properties/bar will be included as an entry "foo/bar" = value in <env>.<prop_preds>.

The final environment is put in special variable, with mangled names (to avoid unwanted clashes with properties of the schema).

Follow-up