json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.69k stars 259 forks source link

`min*`, `max*`, `pattern` constrained from validated JSON data #1519

Closed 0x522D43 closed 3 months ago

0x522D43 commented 3 months ago

Hello,

Yesterday, I searched for a way to write a JSON schema that puts a constraint on the maxItem of an array based on value of another property of my object.

Use Case

I would like to set a constraint value from the value of a property of the validated data.

Example

Schema

type: object
properties:
  max_size:
    type: integer
    minimum: 1
  data:
    type: array
    items:
      type: string
    maxItems: 
      $refValue: '#/properties/max_size' # equals to `max_size` value

JSON to vallidate

✅ Valid ⛔ Invalid
`data.maxItems` is inferred to be **3** (from `max_size` value) ```json { "max_size": 3, "data": ["a", "b"] } ``` The length of data is **less or equal** to `max_size` value `data.maxItems` is inferred to be **1** (from `max_size` value) ```json { "max_size": 1, "data": ["a", "b"] } ``` The length of data is **greater** than `max_size` value

My suggestion

I thought about an additional keyword $refValue (better names can be suggested 😊) that will refer to the value to use in place.

I also thought about using the already existing $ref, but this has not exactly the same purpose and behaviors: $ref will reference a part of a schema, here I need a reference to the value of the final data validated.

Scope

I think this feature can be applied to almost all Validation Keywords.
required and dependentRequired from Objects validation might be out of scope.

Schema consistency validation

To validate the consistency of a schema using this notation, we need to check that the type of the referenced target property matches the type of the constraint.

I.E.: maxItems must match type: integer and minimum: 0, so the reference should at least match these constraints

When the reference is not available during consistency validation, the constraint is not applied, and like to $ref a warning/error can highlight the issue.

JSON validation against Schema

When $refValue is parsed by the validator, it will:

  1. Check if the target is available
    • SUCCESS: go to the next step
    • FAILURE: error message that indicates that ref is not available
  2. Check that the target field is valid based on it's schema constraint (in the example, max_size should be have type: integer and minimum: 1 )
    • SUCCESS: go to the next step
    • FAILURE: error message that the referenced does not match its own constraints
  3. Check that the target field is valid based on constraints of the field to set (in the example, maxItems should have type: integer and minimum: 0)
    • SUCCESS: go to the next step
    • FAILURE: error message that the referenced field constraints are not stronger or equal to the constraint of the max*/min*... field
  4. Replace maxItems value with the target value (in the example, replace {"$refValue": "#/properties/max_size"} by 3)
    • SUCCESS: go to the next step
  5. Apply the validation as it is done currently
    • SUCCESS: go to the next token

Security consideration

As the final schema is not known until the JSON to validate is provided, some bad data might be inputted.
This is why, as of now, I limited this suggestion to leaf-level keywords in the JSON tree with some strong constraints on them.

Be more generic?

This suggestion may be extended to a more generic implementation where $refValue will be available from anywhere like $ref. But I think this should be discussed in another topic as it implies huge thinking and security concerns.


Let me know what do you think about this. Thank you

gregsdennis commented 3 months ago

This use case (and the proposal) actually has a long history (see issues with the $data label).

The primary problem with this approach is validation of the schema itself.

{
  "maxItems": { "$refValue": "#/properties/max_size" }
}

isn't a valid schema because maxItems MUST be a positive integer.

Yeah, we could make it so that it should be a positive integer OR this reference object (similar to how OpenAPI uses $ref), but we'd need to apply that to literally everything, and and that gets messy pretty quickly.


However, this functionality can be achieved without breaking schema validity by using my data vocabulary. Here are a couple examples: https://docs.json-everything.net/schema/examples/data-ref/.

The data keyword is a separate keyword that dynamically builds a subschema that applies to the instance.

So in your case, you'd have

type: object
properties:
  max_size:
    type: integer
    minimum: 1
  data:               # this is your property
    type: array
    items:
      type: string
    data:             # this is my keyword
      maxItems: '/properties/max_size'

This dynamically builds a secondary schema with {"maxItems": <the data value (which we hope is an int)> } and then evaluates the instance against that, effectively adding the new constraints.

Because all of this resides in a keyord that validates as "an object with string values" and resolves at evaluation time, it passes schema validation. If the pointer resolves to a value that isn't acceptable to the keyword, then evaluation halts.

I haven't seen any implementations of this outside my own JsonSchema.Net, which is (probably obviously) a .Net library.

0x522D43 commented 3 months ago

Thank you for the links and examples I will close this issue.