json-schema-org / json-schema-vocabularies

Experimental vocabularies under consideration for standardization
50 stars 9 forks source link

Proposal: "jsonpointer" type #23

Open erosb opened 7 years ago

erosb commented 7 years ago

Introduction

While the JSON Schema specification itself utilizes JSON Pointers, it has only a very basic support for letting schema authors specify a schema which expects JSON documents to use such JSON Pointers. Currently, when a schema author wants to describe a JSON Pointer, the only thing he/she can do is defining the schema with "type":"string" and probably adding a regex restriction which mandates the string to be a syntactically valid json pointer.

In practical usecases it is useful to be able to describe the schema of the referred JSON value. This proposal targets these usecases.

Purpose

The purpose of this new type is to let schema authors expect that the document being validated contains a JSON Pointer, and this pointer denotes a value which conforms to the restrictions of a schema defined by the schema author.

Syntax

Example schema:

{
    "properties": {
        "ptrToNum": {
            "type": "jsonpointer",
            "referredSchema": {
                "type": "number"
            }
        }
    }
}

Example valid document against the above schema:

{
    "ptrToNum" : "#/settings/width",
    "settings" : {
        "width" : 4
    }
}

Example invalid document against the above schema:

{
    "ptrToNum" : "#/settings/width",
    "settings" : {
        "width" : [ 4 ]
    }
}

Validation

If a schema instance has a "type" : "jsonpointer" property , then it must also have a "referredSchema" key and a then an implementation should perform the following validation steps:

Notes

erosb commented 7 years ago

@handrews @awwright @Relequestual any thoughts on this?

handrews commented 7 years ago

@erosb can you give a more specific use case for this? I understand how the proposal works, but I cannot think of a situation where I would need this sort of indirect specification.

Also, the value you have given for ptrToNum is not actually a JSON Pointer, but a URI fragment using JSON Pointer syntax (If you removed the leading "#" then it would be a JSON Pointer). I am assuming that's just a typo- even with that changed I am unclear on the utility here.

Relequestual commented 7 years ago

What @handrews said.

handrews commented 7 years ago

@erosb I'm going to go ahead and submit a PR for json-schema-org/json-schema-spec#109 that will include jsonpointer as a format, because even if this proposal (#141) is accepted, not all jsonpointers are used to refer back into the same instance or to apply any further validation. Just wanted to warn you- when you see a PR for json-schema-org/json-schema-spec#109 that's not me dismissing this issue.

I still would need to see some real use cases for this one (#141) before I could support it, though.

erosb commented 7 years ago

Okey. These 2 proposals can even live together, json-schema-org/json-schema-spec#109 for untyped references and this for typed ones. I will get back to here with some examples when I will have some more time.

erosb commented 7 years ago

Example for using typed JSON Pointers

Lets consider a json format which represents 3D objects in a scenegraph and collisions of elements. So it has 2 root properties:

Each individual collision should be represented with a list of JSON pointers, pointing to the colliding objects in the scene. These pointers are allowed to denote only objects and no other type of data in the document - and this is the restriction which can be expressed by the proposed JSON pointer type.

So generally speaking, JSON is basically designed to represent hierarchical (tree) data structures. Once the data model becomes a graph and not a tree, the most robust solution to express (possibly circular) references is using JSON Pointers.

This proposal (#141) helps in making these pointers typed. So in draftv4 there is not even a way to make syntactical restrictions on a string to verify that it is a JSON Pointer, and this is the problem being solved by "format":"jsonpointer" proposed in json-schema-org/json-schema-spec#109 . This is much better than having nothing, but the pointers exressed this way are untyped, so even after schema validation, a consuming application needs to manually validate the referred values, in case it has some necessary preconditions regarding their structure (and chances are high it has).

So as an analogy, having only untyped pointers is like working with java.lang.Object-type references in java and casting and instanceof-ing the referred values all the way. So that is not a good practice, and JSON Schema must also provide a mechanism to counter it, even if it is less important in this domain. This analogy is not even that far, if you consider JSON Schema as a type system.

Final thoughts:

handrews commented 7 years ago

"the value you have given for ptrToNum is not actually a JSON Pointer" - well I'd argue with that, "#/settings/width" is a JSON Pointer in its URI fragment identifier representation, according to RFC 6901. But it is a detail being unimportant at the moment.

It's actually quite important. "/settings/width" would be {"format": "jsonpointer"} while "#/settings/width" would be {"format": "urireference"}, with the media type of the referenced resource determining whether the fragment can be interpreted as a JSON Pointer or not.

Either type could have constraints on the thing to which it is pointing, although with a URI Reference the constraint can already be described through the "profile" media type parameter and/or link, and the "describedBy" link. In that case you are relying on the referenced resource to provide the correct schema.

handrews commented 7 years ago

To repeat a relevant response to your comment in json-schema-org/json-schema-spec#150

RFC 6901 does not define a representation of a JSON Pointer as a URI fragment. It defines how to encode a JSON Pointer in a URI fragment. (specifically UTF8 + percent encoding).

That is just a URI fragment, and the interpretation of the fragment as a JSON Pointer is controlled by the media type of the document (which is always how fragments are interpreted).

If you want to use a URI with a JSON Pointer fragment, that is just a "uri" or "uriref" format. It is not, by itself, a JSON Pointer.


It would be just as reasonable to propose a reference type (with referredSchema) that works with a URI Reference, or that can take either a URI Reference or a JSON Pointer.

For URI References, you need to define the base URI (is it the same base used for $ref?). For JSON Pointers you need to define to which document it applies (the schema? the instance? something else?)

Also keep in mind that most uses of JSON Pointer from within a schema actually need Relative JSON Pointers for the same reason that most uses of URIs are actually URI References: when re-using a bit of schema you may not know where the schema is being used, but you more often know or have defined as sub-schemas what other schemas or instances exist relative to the point at which the schema is used.

erosb commented 7 years ago

@handrews the following came into my mind when reading your comment above: what about defining these references as

If "format" is "urireference" then it resolved at the instance level in the same way as "$ref"s are resolved in the schema. If "format" is "jsonpointer" then it is evaluated as defined in RFC6901.

What do you think?

handrews commented 7 years ago

@erosb that seems like a reasonable approach. I'm not entirely sold on whether the use case is in-scope for JSON Schema, but I need to look at it a bit more. Right now I'm focusing on trying to wrap up Draft 06 so I'll probably come back to this once that's sorted.

handrews commented 7 years ago

I just realized that an obvious use case for this proposal is in the meta-schema, placing restrictions on what $ref can reference:

{
    "definitions": {
        "jsonReference": {
            "properties": {
                "$ref": {
                    "type": "reference",
                    "format": "uriref",
                    "referredSchema": {"$ref": "#"}
                }
            }
        }
    }
}

Note "format": "uriref" instead of "format": "jsonpointer" as $ref uses a URI reference, which sometimes happens to be a fragment containing a JSON Pointer, but "format": "jsonpointer" would be a valid option in other situations.

In issue https://github.com/json-schema-org/json-schema-spec/issues/98#issuecomment-268618658 I just noted that this type would be useful for defining the with clause more clearly. The difference between this meta-schema and the one in that comment of issue json-schema-org/json-schema-spec#98 is that "nonValidationSchema" here further restricts $ref to only refer to another non-validation schema:

{
    "definitions": {
        "validationProperties": {
            "enum": ["type", "multipleOf", "maximum", ...]
        },
        "nonValidationSchema": {
            "allOf": [
                {"$ref": "#"},
                {"propertyNames": {"not": {"$ref": "#/definitions/validationProperties"}}},
                {
                    "properties": {
                       "$ref": {
                            "referredSchema": "#/definitions/nonValidationSchema"
                        }
                    }
                }
            ]
        }
    },
    "properties": {
        "$use": {
            "type": "object",
            "properties": {
                "source": {"$ref": "#"},
                "with": {"$ref": "#/definitions/nonValidationSchema"}
            }
        }
    }
}
erosb commented 7 years ago

I'm glad to see you start liking it :)

Hopefully I will have some time to work on this after Christmas.

epoberezkin commented 7 years ago

"type" is used as basic JSON type, with "integer" being the only exclusion... Maybe instead of proliferating the exclusions we should indeed consider dropping "integer" and use format for it? @erosb I don't understand how "format": "jsonpointer" is not sufficient and why do you want to abuse type to achieve it...

epoberezkin commented 7 years ago

If it's an alternative to $data proposal, it seems an unnecessary complex and confusing alternative.

But it seems like it is something else...

I don't think I understand your use-case.

epoberezkin commented 7 years ago

I just realized that an obvious use case for this proposal is in the meta-schema

@handrews you can't really bend the standard for the purpose of writing meta-schemas... Users write schemas, meta-schemas serve the standard, enforce it, not the other way around...

erosb commented 7 years ago

Back to your question @epoberezkin :

I don't understand how "format": "jsonpointer" is not sufficient

"type":"jsonpointer" (or "type":"reference") is typed, ie. the schema of the referred value is defined by the schema (therefore can be validated by an implementation), while "format":"jsonpointer" is untyped, so the referred value can be anything, which means that the consumer of the json document should make post-validation steps for clarifying structure.

Regarding unnecessary complexity: if you don't want to specify the schema of the referred value, then you can still create untyped references like

{
"type":"reference",
"format":"jsonpointer"
"referredSchema":{}
}

(or even no "referredSchema" key at all)

For more detailed explanation please read this comment above. It describes the generic usecase I had in mind through an other example.

Thanks.

epoberezkin commented 7 years ago

@erosb I don't understand how example in the issue (pointers within data instances - which I don't understand the use case for) correlate to the comment you linked to - which seems to be explaining some real use case but can be covered with the existing vocabulary. Leaving aside the fact that you are using type keyword, that has a different meaning, I still don't understand. So far it seems very artificial. Can you please post some simplified real life example of a data/schema with and without this suggestion and explain what are you achieving with this proposal that you cannot already achieve? Is it essentially defining meta-schema for the referred schema at the point of $ref (that's what @handrews seems to be thinking)? But it's not what you have in the beginning of the issue. Lost...

erosb commented 7 years ago

Is it essentially defining meta-schema for the referred schema at the point of $ref

No, one lower layer of abstraction. It is defining the schema of the referred value , pointed by the reference.

Of course in case this construct is used in the meta-schema, it becomes "defining meta-schema for the referred schema at the point of $ref", as you wrote (everything is on a higher abstraction layer when we talk about meta-schema).

So the significant difference between the "format":"jsonpointer" proposal and this one is knowing the schema of the referred value.

handrews commented 7 years ago

@handrews you can't really bend the standard for the purpose of writing meta-schemas... Users write schemas, meta-schemas serve the standard, enforce it, not the other way around...

@epoberezkin ???? I'm not sure what you think I'm doing but I'm pretty sure I'm not doing that.

I had asked for a use case and was skeptical about such existing, and then realized that we could make use of this feature in the meta-schema to help describe the existing specification. I'm not saying that we have to add this because it would be useful in the meta-schema, so I'm not sure where "bending" comes in here. I'm just saying that if we added this, then there would be a use case for it already in the meta-schema.

Does that make sense? The fact that I used the meta-schema for an example doesn't give it any more weight than if I used some other random schema from out in the wild.

epoberezkin commented 7 years ago

Ok sorry :)

epoberezkin commented 7 years ago

No, one lower layer of abstraction. It is defining the schema of the referred value , pointed by the reference.

@erosb But why do you need to have a referred value in data? We have doubts even about $data which is less abstract - referring to data from the schema and using it during validation (that seems to have quite a few use cases). You are not only suggesting that we should refer to the data from the data, but also to be able to validate such references.

Even if there is a use case, I think it is very confusing to use "type" keyword for it - we'll need to come up with a different syntax if we agree that there is a sufficiently wide use case... It took me a lot of effort to understand what is the intention. It could be keyword $datapointer that has schema as it's value - what's the point of having two keywords where one is sufficient, particularly when one of them has different meaning completely?

In any case, I still don't understand the use case...

handrews commented 7 years ago

@epoberezkin no worries.

handrews commented 7 years ago

In any case, I still don't understand the use case...

@epoberezkin Does the use case of using it to describe $ref make sense? Never mind whether it's compelling enough to accept the proposal, but do you understand how and why I used it there as an example? If not, what does not make sense?

Once we agree on whether that is a valid example we can then talk about what sort of other use cases might exist, but I want to make sure we're on the same page with that example first.

epoberezkin commented 7 years ago

I think I understand now, having re-read the examples by @handrews and @erosb (about collisions).

I don't think these validation scenarios should be part of the spec - they seem very specialised. They also may cause extra validation of the same data - first as the data in its own location and secondly in the referred location. Validators already handle validation of referred schemas, likewise a particular implementation of scene/collision may do regex/format based validation of json pointer based on the conventions in the tree data structure and just check that they are not empty when they are resolved in application - otherwise they will be resolved and validated twice if the proposed feature is used.

It's been pointed out many times that JSON-schema domain is structural data validation and it should not be concerned with semantic data validation that belongs in applications. This idea seems very much like a semantic validation.

I think it can be an interesting standard extension (I think I can make a custom validation keyword for Ajv quite easily), but I don't think it should be in the validation standard core. I need to sleep on it :)

awwright commented 7 years ago

"type" right now only refers to the six core primitive types defined in JSON.

While type:"integer" is additionally specified by JSON Schema, it's only provided as an author convenience, it's the same as {type:"number", multipleOf:1}

This can be a new keyword, however.

Also note JSON Schema doesn't provide any mechanisms for checking data consistency, only structure. There's no way to make one piece of data be valid or invalid depending on the existence/value of another.

handrews commented 7 years ago

There's no way to make one piece of data be valid or invalid depending on the existence/value of another.

That is completely untrue:

{
   "oneOf": [
        {"properties": {"foo": {"const": 10}, "bar": {"type": "integer"}}},
        {"properties": {"foo": {"not": {"const": 10}}, "bar": {"type": "string"}}}
    ]
}

requires "bar" to be an integer if "foo" is 10 and a string otherwise.

awwright commented 7 years ago

@handrews I mean, in the sense the you can use "oneOf" yeah, you can.

But you can't generalize that example to say "for any number 10"

erosb commented 7 years ago

There's no way to make one piece of data be valid or invalid depending on the existence/value of another.

"dependencies" serves exactly this purpose, doesn't it?

epoberezkin commented 7 years ago

"dependencies" serves exactly this purpose, doesn't it?

Not exactly. It makes piece of data valid or invalid based on the existence of its property, not some other data.

epoberezkin commented 7 years ago

I think we need some general approach of adoption of new keywords... Similarly to how HTML/CSS/DOM is extended. The feature is implemented as experimental in some browsers, then based on the feature usage it may be adopted by other browsers and only after it it may become a standard.

I would like us to have a similar approach. Implement the keyword in some validators, try using it, then add it to the spec if it is indeed useful for many people. Otherwise it can forever stay an experimental feature.

It may be a separate issue, but it applies to this keyword as well. It is undoubtedly needed for some users (at least one). The question is whether there is a sufficient number of users who need it and whether it should be a part of the spec. So let's add it to some validators as an optional extension, try using it, see how many people use it and then decide. Debating it is not particularly productive...

epoberezkin commented 7 years ago

With regards to the syntax I suggest $dataref: { <schema> } (or any other single keyword), abusing type keyword is not a good idea.

epoberezkin commented 7 years ago

Also it seems like it should expect non-encoded absolute or relative JSON pointer, without any # character - it is not URI, it is a pointer to the current data instance, so there is no reason to use hash-fragment of URI (as there is nowhere else to point to but the current schema data instance).

tedepstein commented 6 years ago

Chiming in on this issue to say that the feature being proposed here is extremely useful and important.

The discussion seems to have died down, and most of that discussion was focused on syntax and confusion about the use case. Somehow the simple message got buried in this, and I'd like to try to bring it back into focus, even if this won't be considered until a later release.

The general use case is straightforward: In defining a schema for some type of JSON content, that content includes a property whose value is essentially a reference to a value that occurs elsewhere. That referent value may be in the same JSON resource that contains the reference, or in some other addressable JSON resource, and the syntax should allow for both of those variants.

The key requirement is this: I need to be able to specify a schema for the referenced value. It's not sufficient to say that the reference itself is a string that looks, syntactically speaking, like a valid JSON Pointer, or JSON Reference, or whatever syntax we deem appropriate. Semantically, the reference must resolve to a value, and the value must conform to the schema that I specify.

The OpenAPI specification has many examples of this.

It defines a Reference Object as an object with a single $ref property, whose value is a URI with an optional JSON-Pointer fragment. This is taken directly from the JSON Reference RFC.

Reference Objects are used in many places. For example, an Operation contains a parameters array, where each element may be an inline Parameter Object or a reference to a Parameter Object. In the latter case, the reference takes the form of a Reference Object.

Here's an example of a valid OpenAPI 3.0 document:

{
  "openapi": "3.0.0",
  "info": {
    "version": "1.0.0",
    "title": "Swagger Petstore"
  },
  "paths": {
    "/pets/{petId}": {
      "get": {
        "summary": "Info for a specific pet",
        "tags": [
          "pets"]
        ,
        "parameters": [
          {
            "name": "petId",
            "in": "path",
            "required": true,
            "description": "The id of the pet to retrieve",
            "schema": {
              "type": "string"
            }
          },
          {
            "$ref": "#/components/parameters/languageParam"
          }]
        ,
        "responses": {
          "200": {
            "description": "Expected response to a valid request",
            "content": {
              "application/json": {
                "schema": {
                  "type": "object",
                  "properties": {
                    "id": {
                      "type": "integer"
                    },
                    "name": {
                      "type": "string"
                    },
                    "tag": {
                      "type": "string"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "components": {
    "parameters": {
      "languageParam": {
        "name": "language",
        "in": "query",
        "description": "2-letter code for the language in which to return the  result.\n",
        "required": false,
        "schema": {
          "type": "string",
          "minLength": "2",
          "maxLength": "2"
        }
      }
    }
  }
}

The /pets/{petId} path item contains a get operation with two parameters:

How would we specify the JSON schema for OpenAPI 3.0, to show that it expects each element of the parameters array to be one of these two things: either an inline Parameter Object or a reference to a Parameter Object?

We can use oneOf, refer to a ParameterObject schema for the inline case, and ReferenceObject for the reference case. But ReferenceObject only enforces the syntax of the reference itself, not the referent value. What we want to say is that, in the case where the value is a Reference Object, the URI provided as the $ref property value must resolve to a JSON object that conforms to the ParameterObject schema.

JSON Schema itself allows $ref properties to refer to schemas in definitions. The benefit is modular reuse, or componentization, and the basic mechanism that enables this is a typed reference. So we already acknowledge the value of this, because we use it ourselves every time we write a JSON Schema.

What's being proposed here is to provide a similar facility, one meta-level down.

Just as its useful for schema designers to reuse components via typed references, it's also useful for other JSON formats to have that same kind of modular reuse by reference. And its useful for us to fully describe and validate those other JSON formats, including their typed references, using JSON Schema.

The JSON schema for OpenAPI is really incomplete without this. We have to document the expected types of $ref property values in our human-readable descriptions. And since tool implementations cannot rely on JSON Schema validators to enforce these constraints, we have to implement our own validations for this common pattern.

So... we can debate syntax, terminology, priorities, and how best to position this proposal and reconcile it with others. But it should not be hard to understand what's being proposed here. And there really shouldn't be any doubt as to whether it's important and valuable for JSON Schema users.

handrews commented 6 years ago

@tedepstein thanks- it's good to get another real-world use case.

I'm going to have to go back through all of this, and may just try to distill it down and re-file it given the numerous digressions and confusions. And also since we've revived Relative JSON Pointer there are now at least three different types of "pointers" (URIs, JSON Pointers, Relative JSON Pointers).

vearutop commented 5 years ago

I agree with @tedepstein, such feature would be very helpful to define comprehensive schema for complex or schematic values (like OpenAPI or AsyncAPI).

I'm in favor of format-based syntax with string type:

{
 "type": "string",
 "format":"json-pointer", // "relative-json-pointer", "uri-reference"
 "target": {
   "required": ["myProperty"]
 }
}

format is historically extensible keyword that can be used without breaking change. It can be consistent with how JSON schema itself defines own $ref:

{
 "type": "string",
 "format": "uri-reference"
}
Ayplow commented 4 years ago

Is there any use case for explicitly following pointers that isnt resolved by supporting $ref in json as it is being validated? So for example, this would pass;

{
  "raw": { "n": 5 },
  "count": { "$ref": "#/raw/n" }
}
{
  "$followReferences": true, // For backwards compatibility? Not sure if this is necessary
  "properties": {
    "count": { "type": "integer" }
  }
}

Making an extra property would be more work for developers, users and implementors, and the only benefit I see is that the reference could be 9 characters shorter.

NOTE: Looking into the history of this, it seems like json-schema-org/json-schema-spec#279 might be relevant, and the reasons it was shot down might apply to any processing of pointers in the input (something about it being okay in the schema because preprocessing?). Until I've updated this, take it with a grain of salt

handrews commented 4 years ago

This would be better handled with an extension keyword than a new value for type. Moving to the vocabularies repo.