pwall567 / json-kotlin-schema

Kotlin implementation of JSON Schema (Draft 07)
MIT License
90 stars 13 forks source link

Performance advice #4

Closed mjm918 closed 2 years ago

mjm918 commented 3 years ago

Hi thanks for the great work!

I'm trying to validate a schema which is taking 120ms I'm running the script on latest M1 Mac. I'm not sure is it good or not. Because for example, if I try to validate 100,000 schema, it takes very long. Is there any to shorten the time

{
  "$schema": "http://json-schema.org/draft/2019-09/schema",
  "$id": "esbridge-log-store",
  "title": "esbridge-log",
  "description": "Keep log of console dump",
  "type": "object",
  "required": ["store","schemas"],
  "properties": {
    "store": {
      "type": "string"
    },
    "schemas": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["name","fields"],
        "properties": {
          "name": {
            "type": "string",
            "description": "name of schema"
          },
          "fields": {
            "type": "array",
            "items": {
              "type": "object",
              "required": ["name","data_type"],
              "properties": {
                "name": {
                  "type": "string"
                },
                "data_type": {
                  "oneOf": [
                    {
                      "$ref": "#/definitions/data-type-string"
                    },
                    {
                      "$ref": "#/definitions/data-type-number"
                    },
                    {
                      "$ref": "#/definitions/data-type-array"
                    },
                    {
                      "$ref": "#/definitions/data-type-object"
                    },
                    {
                      "$ref": "#/definitions/data-type-boolean"
                    }
                  ]
                }
              }
            }
          },
          "indexes": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        }
      }
    }
  },
  "definitions": {
    "data-type-string": {
      "type": "string"
    },
    "data-type-array": {
      "type": "array"
    },
    "data-type-object": {
      "type": "object"
    },
    "data-type-number": {
      "type": "number"
    },
    "data-type-boolean": {
      "type": "boolean"
    }
  }
}

I tried removing $schema and parse but it still takes 100ms

Please advice. Thank you

pwall567 commented 3 years ago

Hi - thanks for taking an interest in the project, and for taking the time to contact me.

The first thing that stands out about your schema is the use of oneOf. This means exactly what it says - it tests all the possibilities and confirms that one and only one of the possibilities is true. If the options are mutually exclusive, as they are in this case, then anyOf has exactly the same meaning, and is much quicker because it stops as soon as it finds a true case. And if you order your cases so that the most common case is first, followed by the second most common and so on, the number of tests actually performed will be greatly reduced.

Then, looking further into the tests in the oneOf, you are testing only the type of the property and that can be achieved much more easily if you replace the entire oneOf by:

    "type": [ "string", "array", "object", "number", "boolean" ]

And since that is the entire range of possible types, what you are in fact checking for is that the property is present and is not null. If you are prepared to accept a null property, then the required entry will confirm that the property is present (with any value at all).

I hope this helps - I like to think that my library will be useful, and I would be very concerned if performance issues were limiting its usefulness.

pwall567 commented 2 years ago

I am going to close this issue on the assumption that the above advice has solved the problem, or has helped, at least.

Feel free to open another issue if you're still having problems.