pboettch / json-schema-validator

JSON schema validator for JSON for Modern C++
Other
466 stars 134 forks source link

Defaults not generated for object whose properties have defaults defined #206

Open ChristianReese opened 1 year ago

ChristianReese commented 1 year ago

The "Default value processing" example snippet in the README gives me the expected output of {"height":10,"width":20}. However, when I modify the schema to put the properties in a "dimensions" object, all of a sudden I get the following output:

Validation of schema failed: [json.exception.parse_error.104] parse error: JSON patch must be an array of objects

When I would expect the output to be:

{"dimensions":{"height":10,"width":20}}

Here's the code that produces the error:

#include <iostream>
#include <nlohmann/json-schema.hpp>

using nlohmann::json;
using nlohmann::json_schema::json_validator;

static const json rectangle_schema = R"(
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "A rectangle",
    "properties": {
        "dimensions": {
            "type": "object",
            "properties": {
                "width": {
                    "$ref": "#/definitions/length",
                    "default": 20
                },
                "height": {
                    "$ref": "#/definitions/length"
                }
            }
        }
    },
    "definitions": {
        "length": {
            "type": "integer",
            "minimum": 1,
            "default": 10
        }
    }
})"_json;

int main()
{
  try {
    json_validator validator{ rectangle_schema };
    /* validate empty json -> will be expanded by the default values defined in the schema */
    json rectangle = "{}"_json;
    const auto default_patch = validator.validate(rectangle);
    rectangle = rectangle.patch(default_patch);
    std::cout << rectangle.dump() << std::endl; // {"dimensions":{"height":10,"width":20}}
  }
  catch (const std::exception& e) {
    std::cerr << "Validation of schema failed: " << e.what() << "\n";
    return EXIT_FAILURE;
  }
  return EXIT_SUCCESS;
}

I would expect that the defaults could still be generated even if I nested the properties in several more layers of objects.

JSON Validator commit used: c6cb3d4c2d022faba5187f66a8ac61fb554d7d92 NLohmann version used: 3.10.5

pboettch commented 1 year ago

Default values are not recursively evaluated it seems. What do you think @martin-koch ?

pboettch commented 1 year ago

@ChristianReese I tried your code and see the same thing. One solution could be to automatically create a tree of default-values and as such create nested default-values, but then the patch might be huge for big schemas with a lot of non-required keywords.

Creating the whole default-value-tree with all literal leaves and then copy the branches to create the patch would be a solution, but would consume a lot of memory.

This is maybe why we didn't do it and stuck to one-level-default-value, ie. not going recursive to extract the leaves.

martin-koch commented 1 year ago

Default values are not recursively evaluated it seems. What do you think @martin-koch ?

The problem is that "dimensions" itself has no default definition. If it does not occur in the provided json document it will not be implicitly created.

@ChristianReese if you try to validate {"dimensions":{}} instead, you will get the expected output Another solution is to set a default value at the level of the dimensions object: "default": {"width":20, "height":10} Both will lead to an output of {"dimensions":{"height":10,"width":20}}.

I think it reasonable to expect the output in the provided example. On the other hand it may also be reasonable that the "dimension" object is not created in this case. So neither the one nor the other behavior is "wrong". I also found a discussion about default value processing where @pboettch was involved as well. Especially note the following comment: https://github.com/json-schema-org/json-schema-spec/issues/200#issuecomment-269404355

pboettch commented 1 year ago

I also found a discussion about default value processing where @pboettch was involved as well. Especially note the following comment: json-schema-org/json-schema-spec#200 (comment)

I wasn't remembering this question, it was when I wrote the first version of the validator.

Back to the question, if OP found this problem to be a bug or unexpected behavior then there is problem. At least in the documentation, or worst in the code.

I think @ChristianReese wants to have nested instances to be filled up/down to their leaves even if they don't have a default value themselves.

Are these the only 2 strategies we could use for default-value patches? One-level, or fully-nested?

ChristianReese commented 1 year ago

Thinking about what @martin-koch said, it makes sense that if no defaults are specified for the dimensions object itself then the default is assumed to be no dimensions object at all. Otherwise it would be assuming that it should exist by default even if that was never specified in the schema. The defaults of the inner properties assume that a dimensions object is created to begin with.

This leads me to the question of how to define the default for the dimensions object. As suggested, I can specify "default": {"width":20, "height":10} and that works. But what if the object has 100+ properties, and each of them have already defined their own defaults? It seems redundant to repeat all of those established defaults again just to say "default to creating this object with the defaults defined by each individual property". I tried using "default": {} for the dimensions object, but that results in {"dimensions":{}}, which I suppose is correct since I'm telling it to default to an empty object. At the same time each individual property has a default which implies that the properties exist by default. I need to look into if there's a shorter way than "default": {"width":20, "height":10} to say "default to creating this object using the defaults already established by the schemas of the individual properties" in the JSON schema spec. If there is, does the schema validator support it?

EDIT: Forgot to mention that, as @martin-koch said, I could also validate {"dimensions":{}} to get the defaults. My above question was more focused on if there's a shorter way to write the knowledge into the schema itself as to what the default for the object is, based on the defaults of the properties, so that I could still use {} and get the defaults back that way. It feels like the JSON Schema spec may have a way of communicating this - I need to look into it (feel free to post if anyone already knows).

ChristianReese commented 1 year ago

After a bit of research I discovered this requirement: https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-10.2

So apparently there aren't any rules for what "default" can be set to - it's implementation specific. I haven't found any settled-upon conventions for a value to set this to to populate defaults for all properties, other than spelling them out as already described. I'm beginning to think that the best way for bringing this knowledge into the schema without redundancy is to have the schema define the defaults at the object level instead of the property level. Or perhaps an even better path with this library specifically is to pass in the objects you want to exist into the object being validated, and then have the defaults fill in. The latter approach is likely the one I'll take for now. So maybe this will just take the form of some new/clarified documentation.

martin-koch commented 1 year ago

This leads me to the question of how to define the default for the dimensions object. As suggested, I can specify "default": {"width":20, "height":10} and that works. But what if the object has 100+ properties, and each of them have already defined their own defaults? It seems redundant to repeat all of those established defaults again just to say "default to creating this object with the defaults defined by each individual property". I tried using "default": {} for the dimensions object, but that results in {"dimensions":{}}, which I suppose is correct since I'm telling it to default to an empty object. At the same time each individual property has a default which implies that the properties exist by default. I need to look into if there's a shorter way than "default": {"width":20, "height":10} to say "default to creating this object using the defaults already established by the schemas of the individual properties" in the JSON schema spec. If there is, does the schema validator support it?

@ChristianReese Just thinking out loud: What if you specify a default of "default": {} for the dimensions object and all child objects. When validating call validate and apply the patch and repeat that until the patch is empty:

json_validator validator{rectangle_schema};
/* validate empty json -> will be expanded by the default values defined in the schema */
json rectangle = "{}"_json;
while (true) {
    const auto default_patch = validator.validate(rectangle);
    if (default_patch.empty()) {
        break;
    }
    rectangle = rectangle.patch(default_patch);
};
std::cout << rectangle.dump() << std::endl; // {"dimensions":{"height":10,"width":20}}

I know this is not ideal in terms of runtime, but at least it should do the task.

ChristianReese commented 1 year ago

@martin-koch That's a really good idea. Because the whole design pattern of this is to take an empty object and populate it with defaults, so we're just making that recursive in that empty sub-objects are generated as defaults and fed back in to populate their sub-property defaults.

It sounds like the resolution for this issue, then, is probably just documenting that approach as a solution to getting a deep-default, or potentially having an option that runs that recursive routine within the library. I don't feel strongly, up to you guys.

EDIT: Thinking of it more, I might actually be inclined to not put this into the library since that way the library focuses on the more atomic operation of generating defaults and probably doesn't need to be any higher abstraction-level than that. As long as this use case is documented it should be good. Again, up to you guys.

Finkman commented 1 year ago

I was on that topic as well. Especially when I added #244. IMHO, the actual feature set is maybe the maximum possible for a validator. For a validator supplying the defaults is a nice side-effect of the validation process.

Saying that, it may needs a different kind of library / software-component witch is more a json-preparer.

The reason for this, is that when traveling through the json schema and the json data, the validator needs to decides to go different paths. @pboettch did a nice work to model a the schema as hierarchy of single schema instances that do their local testing, but they are not fully aware of hypothetical values that might be present if the whole thing would take default values into account, as well.

This is always a thing, when there are conditional paths with oneof, anyoh, etc. It will be always unclear what path to choose when it comes to find out the defaults. And because of the complex-nature of the schemas, the validation may fail after adding the defaults and validating again, since it could take different paths into account.

Next issue is, that if a path is required and not existent, the validation will stop there with the error message. What to do in this case if the schema provides a default for that property? I can thing of remembering the error in that case, but processing further and do as if the default value would be present, while adding that to the defaults patch. I guess, this would take a lot of brain-work to implement and would require a big re-write of the entire library. But even with my previous argument, it would not lead to a full satisfying feature implementation. Since the way a validator processes data is not the same as a preparer would do.