networknt / json-schema-validator

A fast Java JSON schema validator that supports draft V4, V6, V7, V2019-09 and V2020-12
Apache License 2.0
826 stars 323 forks source link

Inconsistent validation error since 1.0.82 #824

Closed eggilbert closed 3 months ago

eggilbert commented 1 year ago

Greetings - Similar to the use case described in https://github.com/networknt/json-schema-validator/issues/285, we're looking to validate a schema against the specification meta schema to ensure it is syntactically valid. And then later on, different JSON payloads are validated against the first schema to ensure they conform.

Based on the comments of https://github.com/networknt/json-schema-validator/issues/313, I see there may be some issues validating with the 2019-09 spec that we're using. That said, we've noticed some strange behavior since version 1.0.82 that seems like a different problem.

I've made an example test case here with the results from different versions of the library. I think due to changes in how the anyOf keyword is handled, different error messages are returned across the library versions. But in 1.0.82 and later, we're getting an inconsistent error message back. The first time a schema is validated returns one error, and then if the same schema is validated again the error response is different which is not what was expected or happening previously.

The first call returns $.type: string found, array expected and subsequent calls return $.type: should be valid to any of the schemas array instead.

@Test
void validate() throws JsonProcessingException {
    final var v201909SpecSchema = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V201909)
        .getSchema(URI.create(JsonMetaSchema.getV201909().getUri()), new SchemaValidatorsConfig());

    final var invalidSchema = new ObjectMapper().readTree("""
        {
            "$schema": "https://json-schema.org/draft/2019-09/schema",
            "type": "cat"
        }
        """);

    // Validate same JSON schema against v2019-09 spec schema twice
    final var validationErrors1 = v201909SpecSchema.validate(invalidSchema);
    final var validationErrors2 = v201909SpecSchema.validate(invalidSchema);

    System.out.println(validationErrors1);
    System.out.println(validationErrors2);

    // Validation errors should be the same
    assertEquals(validationErrors1, validationErrors2);

    // Results
    //
    // 1.0.73
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    //
    // 1.0.74
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: string found, array expected]
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: string found, array expected]
    //
    // 1.0.78
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    //
    // >= 1.0.82
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: string found, array expected]
    // [$.type: does not have a value in the enumeration [array, boolean, integer, null, number, object, string], $.type: should be valid to any of the schemas array]
    //
    // ?????
}
fdutton commented 1 year ago

The validators get loaded into a TreeMap and sorted based on the type of validation performed. This is done because the standard requires that some validations be performed before others. For example, properties must be validated before unevaluatedProperties can be checked. I suspect this is the source of the non-determinism. I even have a comment in the code to investigate this.

TODO: This smells. We are performing a lexicographical ordering of paths of unknown depth.

justin-tay commented 8 months ago

I have a pull request that fixes this.

The issue in this particular case isn't the order of the validators, because even after ensuring a deterministic order, this still happens.

The primary cause is these lines

https://github.com/networknt/json-schema-validator/blob/0c14a7a8664334fd10108895f84d8232f6c34019/src/main/java/com/networknt/schema/AnyOfValidator.java#L75-L83

The main reason is that schema.hasTypeValidator() and schema.getTypeValidator() depends on the typeValidator field that is only lazily set once getTypeValidator() is first called. To make it consistent, the schema.getTypeValidator() call should ensure that the validators are already loaded.

This means that on the first run, hasTypeValidator() returns false. The schema validation happens, and the type validator in the schema runs and generates the type validation error. As getValidators() is called, the typeValidator field is now set.

In the second run, hasTypeValidator() returns true and the anyOf validator generates the should be valid to any of the schemas array message instead.

Another issue is that actually the error message built should be that of the type validator, ie. the type validator should be called with validate to generate the actual message instead of using the message of the anyOf validator as the should be valid to any of the schemas array message doesn't actually make sense.

There are actually other issues for instance https://github.com/networknt/json-schema-validator/blob/0c14a7a8664334fd10108895f84d8232f6c34019/src/main/java/com/networknt/schema/AnyOfValidator.java#L127

Where collect(Collectors.toSet()) is used which will collect to a HashSet where are LinkedHashSet should be used to enforce ordering.