voxpupuli / json-schema

Ruby JSON Schema Validator
MIT License
1.53k stars 242 forks source link

new feature :require_all #296

Closed pkarman closed 7 years ago

pkarman commented 8 years ago

This PR makes the :strict validation feature, optionally, a little less strict.

If the :require_all option flag is set explicitly to false and strict:true, then the current strict behavior of requiring all properties be present is turned off, and the schema definition of required is preferred.

This allows for testing additionalProperties schema feature in isolation.

pkarman commented 8 years ago

@iainbeeston any sense of whether this feature is acceptable?

RST-J commented 8 years ago

General thoughts: First, I prefer the idea of strict mode to be auto-additionalProperties: false because requiring certain properties while others may be optional is something that should definitely be defined explicitly. I don't know the reasoning behind this aspect of strict mode when it was originally invented. So, I'm in favour of getting this feature in, I'd turn it on in some of my projects immediately.

However, we are discussing how to simplify things in json-schema for quite a while now (well, actually some time ago and then in the recent past not so much happened at all). And one of my concerns in this respect is the amount of flags we have. Not the flags themselves but to have an intuitive idea of what is the consequence of all the combinations (maybe that is not actually that bad and just my subjective impression).

That said, I think, require_all should be a nested option of strict as it doesn't have effects without it, right? So you could have strict => :true and the long form strict => {require_all => false} or to enable your feature strict => {require_all => true}.

Thoughts on this?

cec commented 8 years ago

In JSON Schema 4, we can't combine schemas without removing additionalProperties: false:

{
"definitions" : {
  "base" : {
     "title" : "Base Schema",
     "properties" : {
       "a" : {"type" : "integer},
     },
     "required" : ["a"],
     "additionalProperties" : false
  }
  "extension" : {
    "title" : "An extension"
    "allOf" : [
      {"$ref" : "#/definitions/base"},
      {
        "properties" : {
          "b" : {"type" : "string"}
        }
      }
    ]
  }
 }
}

The above schema would fail.

This is a JSON Schema limitation that draft 5 is meant to address.

@RST-J @pkarman I think we all agree that a validation mode failing when unexpected properties are present is necessary.

additionalProperties ensures that props not described in the schema lead to validation failure. required is instead used to make a set of properties mandatory.

At the moment the strict mode makes every property mandatory, but JSON Schema already provides an option for this: the required property. People using strict mode instead of using required do not have portable schemas as validating them with another validator would generate false-positives.

Thus, if strict applies required_all by default, I believe that usage of strict mode should be discouraged.

IMHO the clearest interface would be:

To conclude:

  1. there is great need for this "strict-not-required" mode
  2. changing the interface could improve the API, but the sooner this is integrated, the better.

Sorry for the long post, I look forward to your comments and to a later integration!

RST-J commented 8 years ago

I agree and remember to have been running into this very problem which will be solved by version 5. This we we'd have a way to emulate this with older versions already. And that it is not compatible with other validators is not so much an issue I guess. Because what it actually provides is confidence about clean data (not getting or sending any additional stuff - OK, I see I have to relax my statement, sending additional stuff may leak unwanted information, however getting more than expected may pollute your database but won't break your functionality).

Did I get that right @cec , you propose three independent flags instead of one that affects the behaviour of another? I think this is the better approach. I haven't looked into the PR yet, but assumed this is how it currently would work, :require_all only works together with strict.

@iainbeeston @pd @hoxworth What do you think? What about 'reduced complexity' vs. 'comfort' thing, how rigorous do we want to be here? I see the benefit especially of globally forbidding additionals on its own.

cec commented 8 years ago

@RST-J what I meant about breaking compatibility was meant for users of the gem, not between an APIs described with schemas and their clients (if that is what you mean by "sending additional stuff").

I agree with you and yes you got my proposal right. Maybe using additional_properties instead of allow_additions would be better as it mimics that behaviour? Anyway this is just a detail.

pkarman commented 8 years ago

Here's a full example illustrating the problem. The schema has been cut down to a minimal case. The full schema is here: https://github.com/18F/e-manifest/blob/master/public/schemas/post-manifest.json

require 'json-schema'  # version 2.6.0

schema = {
  '$schema' => "http://json-schema.org/draft-04/schema#",
   id: "https://e-manifest.18f.gov/schemas/post-manifest.json",
   method: 'POST',
   type: "object",
   required: [ "generator" ],
   properties: {
    generator: {
      type: "object",
      required: ["manifest_tracking_number"],
      properties: {
        name: {
          description: "(5) Generator company name as presented in RCRAInfo",
          type: "string",
          maxLength: 80
        },
        manifest_tracking_number: {
          description: "(4) Unique tracking number assigned to each manifest...Starts with 9 numbers and ends with 3 Letters",
          type: "string",
          maxLength: 12,
          minLength: 12,
          pattern: "^[0-9]{9}[A-Za-z]{3}$"
        }
      }
    }
  }
}

strict_schema = schema.dup
strict_schema['additionalProperties'] = false

valid_example = {
  generator: { manifest_tracking_number: '987654321abc' }
}

invalid_example = { 
  generator: { name: 'foobar' },
  foo: 'bar'
}

opt_sets = [ {}, { strict: true } ]

opt_sets.each do |opts|
  puts '='*80
  errors = JSON::Validator.fully_validate(schema, valid_example, opts)
  puts "Valid example with schema with opts #{opts.inspect} : #{errors.inspect}"
  puts '-'*80
  errors = JSON::Validator.fully_validate(schema, invalid_example, opts)
  puts "Invalid example with schema with opts #{opts.inspect} : #{errors.inspect}"
  puts '-'*80
  errors = JSON::Validator.fully_validate(strict_schema, valid_example, opts)
  puts "Valid example with strict_schema with opts #{opts.inspect} : #{errors.inspect}"
  puts '-'*80
  errors = JSON::Validator.fully_validate(strict_schema, invalid_example, opts)
  puts "Invalid example with strict_schema with opts #{opts.inspect} : #{errors.inspect}"
end

When I run that script, I get:

===============================================================================
Valid example with schema with opts {:record_errors=>true} : []
--------------------------------------------------------------------------------
Invalid example with schema with opts {:record_errors=>true} : ["The property '#/generator' did not contain a required property of 'manifest_tracking_number' in schema https://e-manifest.18f.gov/schemas/post-manifest.json"]
--------------------------------------------------------------------------------
Valid example with strict_schema with opts {:record_errors=>true} : []
--------------------------------------------------------------------------------
Invalid example with strict_schema with opts {:record_errors=>true} : ["The property '#/generator' did not contain a required property of 'manifest_tracking_number' in schema https://e-manifest.18f.gov/schemas/post-manifest.json", "The property '#/' contains additional properties [\"foo\"] outside of the schema when none are allowed in schema https://e-manifest.18f.gov/schemas/post-manifest.json"]
===============================================================================
Valid example with schema with opts {:strict=>true, :record_errors=>true} : ["The property '#/generator' did not contain a required property of 'name' in schema https://e-manifest.18f.gov/schemas/post-manifest.json"]
--------------------------------------------------------------------------------
Invalid example with schema with opts {:strict=>true, :record_errors=>true} : ["The property '#/generator' did not contain a required property of 'manifest_tracking_number' in schema https://e-manifest.18f.gov/schemas/post-manifest.json", "The property '#/generator' did not contain a required property of 'manifest_tracking_number' in schema https://e-manifest.18f.gov/schemas/post-manifest.json", "The property '#/' contained undefined properties: 'foo' in schema https://e-manifest.18f.gov/schemas/post-manifest.json"]
--------------------------------------------------------------------------------
Valid example with strict_schema with opts {:strict=>true, :record_errors=>true} : ["The property '#/generator' did not contain a required property of 'name' in schema https://e-manifest.18f.gov/schemas/post-manifest.json"]
--------------------------------------------------------------------------------
Invalid example with strict_schema with opts {:strict=>true, :record_errors=>true} : ["The property '#/generator' did not contain a required property of 'manifest_tracking_number' in schema https://e-manifest.18f.gov/schemas/post-manifest.json", "The property '#/generator' did not contain a required property of 'manifest_tracking_number' in schema https://e-manifest.18f.gov/schemas/post-manifest.json", "The property '#/' contains additional properties [\"foo\"] outside of the schema when none are allowed in schema https://e-manifest.18f.gov/schemas/post-manifest.json"]

What that is telling me is that strict actually doesn't matter if additionalProperties and required are used appropriately in the schema. My dilemma came up when I was trying to use the same validator config options for different schemas, and when I was still developing my schema. It seems to me that, at least for JSON Schema v4, the strict option is unnecessary if the schema is defined correctly, and leads to false-positive errors if used with a correct schema. The documentation states:

With the :strict option, validation fails when an object contains properties that are not defined in the schema’s property list or doesn’t match the additionalProperties property. Furthermore, all properties are treated as required regardless of required properties set in the schema.

It's the "furthermore" part that I wanted to be able to turn off. I like "strict"-ness in my testing, so I turned it on for all my tests. Then I found that the feature went beyond what was actually defined in my schema, and that's when I hacked in the the require_all feature, quite ignorant of the history of that feature request on this repo.

My implementation tried to meet 3 criteria:

  1. minimally invasive
  2. preserve existing behavior by default
  3. satisfy my own test cases

As it stands now, if the require_all feature is not merged, I will probably just turn off strict in all my tests and write additional tests to verify that my schemas themselves have the appropriate required and additionalProperties attribute values.