networknt / json-schema-validator

A fast Java JSON schema validator that supports draft V4, V6, V7, V2019-09 and V2020-12
Apache License 2.0
807 stars 320 forks source link

Validate JSON Schema itself? #996

Closed jksevend-trimble closed 4 months ago

jksevend-trimble commented 4 months ago

Hello, This is not really an issue with the library, but a question.

Background

I have a POST endpoint to which users can send their schema which is then saved to a database. A requirement is that the schema that is being sent to the endpoint itself needs to be validated, so to prevent that random JSON objects are saved to the database as "valid schema".

Question

Is it somehow possible to validate the JSON Schema itself? What I have tried is something like this:

private boolean checkIsSchemaValid(final String schemaContent) {
    final JsonSchemaFactory schemaFactory = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V202012);
    try {
        schemaFactory.getSchema(schemaContent);
        return true;
    }
    catch (final Exception e) {
        return false;
    }
}

Now when a random string is sent, an exception is thrown as expected. But if i send a random JSON like

{
    "lol": 420
}

no exception is thrown. The exception is only thrown when JsonSchema#validate is called with a JSON object I want to validate, which is being called at another point in time and a different endpoint.

Would it somehow be possible to validate the schema itself before we try to validate a JSON object?

Best, Julian

justin-tay commented 4 months ago

That actually depends on what you define to be a valid schema.

You would typically perform validation of a schema by using it's meta-schema. This only validates it's structure though, for instance for a $ref it would only just check if it is a valid uri_reference format and to figure out if the $ref can resolve, you would need to actually load the schema.

Here however, from a meta-schema perspective, your example schema is valid, as it would allow unknown keywords. You may consider a custom meta-schema eg. using unevaluatedProperties: false or something to that effect or you could add extra restrictions when loading the schema.

The following validates the schema according to the meta-schema, and it's valid due to how the meta-schema is defined.

package com.networknt.schema;

import org.junit.jupiter.api.Test;

import com.networknt.schema.SpecVersion.VersionFlag;

public class Issue996Test {
    @Test
    void metaSchemaValidation() {
        String instanceData = "{\r\n"
                + "    \"lol\": 420\r\n"
                + "}";
        SchemaValidatorsConfig config = new SchemaValidatorsConfig();
        config.setPathType(PathType.JSON_POINTER);
        config.setFormatAssertionsEnabled(true);
        JsonSchema schema = JsonSchemaFactory.getInstance(VersionFlag.V202012).getSchema(SchemaLocation.of(SchemaId.V202012), config);
        System.out.println(schema.validate(instanceData, InputFormat.JSON, OutputFormat.HIERARCHICAL, executionContext -> {
            executionContext.getExecutionConfig().setAnnotationCollectionEnabled(true);
            executionContext.getExecutionConfig().setAnnotationCollectionFilter(keyword -> true);
        }));
    }
}

You can also load the schema with additional restrictions after validating against the meta-schema. This for instance doesn't allow unknown keywords or unknown formats. You need to initializeValidators to check if all the $ref resolve.

package com.networknt.schema;

import org.junit.jupiter.api.Test;

import com.networknt.schema.SpecVersion.VersionFlag;

public class Issue996Test {
    @Test
    void test() {
        String instanceData = "{\r\n"
                + "    \"lol\": 420\r\n"
                + "}";
        JsonMetaSchema metaSchema = JsonMetaSchema.builder(JsonMetaSchema.getV202012())
                .unknownKeywordFactory(DisallowUnknownKeywordFactory.getInstance())
                .build();
        SchemaValidatorsConfig config = new SchemaValidatorsConfig();
        config.setStrict("format", true);
        JsonSchemaFactory factory = JsonSchemaFactory.getInstance(VersionFlag.V202012, builder -> builder.metaSchema(metaSchema));
        JsonSchema schema = factory.getSchema(instanceData, config);
        schema.initializeValidators();
    }
}
jksevend-trimble commented 4 months ago

This is really helpful @justin-tay Thank you very much. I will try it

jksevend-trimble commented 4 months ago

Hello again, Your second solution is really great. I can catch the exception and retrieve a message like

Keyword '$ids' is unknown and must be configured on the meta-schema or vocabulary.

Now I do not want to get picky here, but in case of multiple typos like "$ids" and "$schemas" I would only get the first key which is not according to the meta schema. Could there be a way to "collect" all errors?

justin-tay commented 4 months ago

That's not really possible with the current code. It's not really designed as a linter. You can send a PR if you think you can add this without affecting the validation performance.

There are many reasons why a schema may be potentially invalid. According to the spec, unknown keywords are completely acceptable, they are just treated as annotation keywords, so you can see that what is valid depends on interpretation. You might also want to restrict the schemas to only use particular draft versions also for instance.

You would likely need to design a custom meta-schema implementation with custom validators if you want to do this. You would need custom validators because JSON Schema actually doesn't have built in validators for doing validation of data, for instance if you want to validate if minProperties is always less than maxProperties that can't be done with any standard keyword. Also validating whether a $ref is resolvable is non-trivial without doing the load.

jksevend-trimble commented 4 months ago

Okay, I think this is not really a big issue for us now. It is okay as it is. I just wanted to actively ask.

Thank you for all your help!