Optionally add default values to json being validated when the field is missing

anthonytw commented 6 years ago

It would be really neat if, in the process of validation, the validator could (optionally) fill in missing values with their defaults if provided in the schema. The logic might look something like this:

IF element missing
    IF required THEN throw error
    ELSE IF default value exists THEN
        ADD the element with the specified default value
    ELSE skip to next element
END IF

VALIDATE element

This logic validates the element when it is entered or when it's replaced by a default value (unless it's either required or there is no default) which lets you validate your schema default values as well.

(Alternatively, default value validation could take place when the schema is added to the validator object, which is probably a good idea, but more work than using the above logic as a stop-gap that lets library users validate their default values by writing a test that simply does not supply them and expects the validation to pass.)

ibc commented 4 years ago

AFAIU this is not about asking for features to this library but about implementing the standard this library is based on. Also, AFAIS there is already support for "default" values even in the version 7 of the draft:

   {
       "title": "Feature list",
       "type": "array",
       "items": [
           {
               "title": "Feature A",
               "properties": {
                   "enabled": {
                       "$ref": "#/$defs/enabledToggle",
                       "default": true
                   }
               }
           },
           {
               "title": "Feature B",
               "properties": {
                   "enabled": {
                       "description": "If set to null, Feature B
                                       inherits the enabled
                                       value from Feature A",
                       "$ref": "#/$defs/enabledToggle"
                   }
               }
           }
       ],
       "$defs": {
           "enabledToggle": {
               "title": "Enabled",
               "description": "Whether the feature is enabled (true),
                               disabled (false), or under
                               automatic control (null)",
               "type": ["boolean", "null"],
               "default": null
           }
       }
   }

Search for "default" in https://tools.ietf.org/html/draft-handrews-json-schema-02#section-7.7.1.

BTW no idea if this library implements it (it should according to the spec).

pboettch commented 4 years ago

Yes, the default field is existing in the standard. It is considered as an annotation and implies no special behavior for validation.

Creating an instance based on the default values in the schema can be interesting though, however it might be out of scope of the validator class.

ibc commented 4 years ago

IMHO default values should imply insertion of them into the given JSON object. Otherwise the app should double check whether the field exists before using it, which somehow invalidates the task of the schema validator by forcing manual checks as if there was not validator.

pboettch commented 4 years ago

The validator is not supposed to change the validation-data-instance so it could use the default-value from the schema, but then what?

Again, what could be done is to create a new instance based on the schemas default values - if present, but this would be a new feature to be developed and independent of the validation process.

anthonytw commented 4 years ago

Philosophically, it's nice to separate validation from the separate process of filling in an incomplete but valid instance. However, practically speaking, it makes much more pragmatic sense to fill in the default values as empty spaces are encountered, because otherwise you either have to:

Rewrite the same code to walk through the schema and the instance jointly to locate the blanks and fill them in (redundant) or
Write new code to fill in the optional values with the defaults (replicating the default values = error prone)

Neither case is ideal: I don't want to rewrite the schema walk-through, that's why I'm using this library; I don't want to write additional code to check the optional fields and fill in with the default values, because then I not only have to manage the default fields in two locations, I also have to synchronize default values in two locations.

It would be useful to at least offer some option like an "empty field with default callback" so I can make the change myself, if desired, but ideally it would just be an option (or separate function call) to fill in the defaults. One could describe this in a way that is still within the philosophical bounds of a validator: an instance is not valid unless all required and optional fields with defaults are filled in, thus the question becomes not "is the schema valid" but rather "can a valid schema be generated from this potentially incomplete instance" (which would mean optional with default fields filled in) in which case it is more natural to expect a filled-in instance as a returned object (indicating what that valid schema is).

But forget philosophy, practically speaking the library is much less useful if I still have to write additional code to check the optional fields manually when it's so naturally handled by the validator. It's still useful, don't get me wrong, but much less so for very large schemas with potentially many (and changing) schemas, especially when multiple schemas can be used with the same application (i.e., not built in, i.e., now I have to write code to re-parse the schema).

Just some suggestions! I would be happy with the callback idea, but I can't think of anything anyone would do with the callback other than fill in the default, so why over-complicate things?

pboettch commented 4 years ago

I see your points.

Did you already implemented some use-cases?

When should a default value be taken into consideration? If a field is missing, but required or even when not required but a default-value is given in the schema? Should it be replaced when it is present but not valid? Other conditions?

Wouldn't it be enough to create a method in the validator returning a JSON-instance containing all default-fields (recursively filled) and then let the user "merge" it?

Or a two step approach: before calling validate() a new call of populate_with_default() is called where the validator would only insert missing fields with its defaults?

anthonytw commented 4 years ago

My use-cases are always handled the second (quicker) method I mentioned, namely, I validate the schema then I generally instantiate some objects with the contents of the instance, manually handling default values. So this is not general at all.

As I would interpret it, anything field with a default value should have said value substituted if it's missing in the instance. Required + default, in my mind, isn't meaningful in this context. In other words:

required without default means the instance must provide something
required with default will be filled with a default if no value is provided, required is somewhat unnecessary when a default is available
optional without default means, if not provided, it will stay missing
optional with default means, if not provided, the default will be used

To me that's more meaningful than having to mark optional fields as required to have them filled in. However, you could supply three operational modes:

Validate but don't complete instance
Validate and fill only empty required fields that have defaults available
Validate and fill any empty field with a default available

I'm not sure why anyone would ever want 2 though.

I'm personally against the "let the user merge" option. The validator is already recursing through the object, why rewrite that for a merge? I might as well recourse through the schema and fill in default values directly, in which case I'm back to square one.

If it were me, I would offer two functions:

validate which takes in a const instance and throws a validation error like it does now (or returns an error list, which would be nice, but the subject of a different feature request)
validate_and_fill which validates and fills in defaults along the way

You can have a separate fill function instead, but it's probably easier to code if you fill while you validate instead of having to abstract the code to either fill or validate. Not sure, haven't looked at the code.

Thanks for the consideration!

pboettch commented 4 years ago

Do you have a concrete use-case as an example for these 4 options?

anthonytw commented 4 years ago

Two of the cases (without defaults) are already covered by what the library currently does (the instance isn't changed). The use case for the other two (with defaults) is what we've discussed here: fill in missing values with the default. This is exactly what anyone would realistically want to do (currently manually) if they were specifying defaults in the schema. Otherwise, why specify a default?

pboettch commented 4 years ago

What I meant with concrete examples is real world examples.

Finkman commented 4 years ago

My proposal:

The validator [or maybe another instance] may create a JSON Patch that gives the operations needed to fill-up the original json which default values. It could be the return value of the validate call.

Then, the user can feel free to apply that patch or simply inspect the patch.

In general, my use case for the default values would be, that the json holds a configuration and the possibility to get the default values would be an easy way to create a config from the scratch.

Finkman commented 4 years ago

One more idea:

Adding a kind of default generator in the way that it creates a json from the scratch using the schema with its defaults. After having a default json the user could diff it with a different json (in my case the config). The resulting patch may contain add operations. Those add operations are the ones that fill up the json (my config) with default values.

Doing this way would not need to change the interface of validate and only needs the default generator.

Finkman commented 4 years ago

Take a look at:

{
  "type": "object",
  "properties": {
    "street_address": {
      "type": "string"
    },
    "country": {
      "enum": ["United States of America", "Canada"],
      "default": "Canada"
    }
  },
  "if": {
    "properties": { "country": { "const": "United States of America" } }
  },
  "then": {
    "properties": { "postal_code": { "pattern": "[0-9]{5}(-[0-9]{4})?" } }
  },
  "else": {
    "properties": { "postal_code": { "pattern": "[A-Z][0-9][A-Z] [0-9][A-Z][0-9]" } }
  }
}

In this schema, the default value has an direct impact on the validation of postal codes. I think, for that reasons, the validator should take the default value into account.

Finkman commented 4 years ago

In this schema, the default value has an direct impact on the validation of postal codes. I think, for that reasons, the validator should take the default value into account.

Even it would make sense to me, it is not specified that way. For the validator, country actually defaults to null.

Finkman commented 4 years ago

I expected this issue to be closed by #89 :thinking:

pboettch commented 4 years ago

To have github close an issue automatically I think it needs to be in the commit's subject "Fix #25". As I squashed all of your commits, this info was lost.

Let's close it manually then.

pboettch / json-schema-validator

Optionally add default values to json being validated when the field is missing #25