python-openapi / openapi-spec-validator

OpenAPI Spec Validator is a CLI, pre-commit hook and python package that validates OpenAPI Specs against the OpenAPI 2.0 (aka Swagger), OpenAPI 3.0 and OpenAPI 3.1 specification.
Apache License 2.0
329 stars 61 forks source link

Slow validation of schemas with lots of $refs with version 0.6.0 #260

Closed miikka closed 11 months ago

miikka commented 12 months ago

Hi! We're using openapi-spec-validator to validate OpenAPI 3 spec files in YAML with thousands of references to a small number of other files. There seems to have been a big performance regression between versions 0.5.7 and 0.6.0 for this use case.

To demonstrate, I've uploaded a large spec consisting of two files, schema.yaml and path.yaml, as a gist here. schema.yaml contains 3000 path entries, each referring to path.yaml, like this:

openapi: "3.0.0"
info:
  version: 1.0.0
  title: Example
  description: Trying to demonstrate a performance regression
paths:
  /path0:
    $ref: "path.yaml"

  /path1:
    $ref: "path.yaml"

  # ...

  /path2999:
    $ref: "path.yaml"

We're using openapi-spec-validator as a library, but the CLI tool shows the problem as well. Here's a quick benchmark with hyperfine:

pipx install --suffix '==0.5.7' 'openapi-spec-validator==0.5.7'
pipx install --suffix '==0.6.0' 'openapi-spec-validator==0.6.0
hyperfine 'openapi-spec-validator==0.5.7 schema.yaml' 'openapi-spec-validator==0.6.0 schema.yaml'
Benchmark 1: openapi-spec-validator==0.5.7 schema.yaml
  Time (mean ± σ):     693.2 ms ±  20.8 ms    [User: 560.7 ms, System: 73.7 ms]
  Range (min … max):   663.3 ms … 718.0 ms    10 runs

Benchmark 2: openapi-spec-validator==0.6.0 schema.yaml
  Time (mean ± σ):     25.114 s ±  6.647 s    [User: 13.150 s, System: 2.636 s]
  Range (min … max):   14.676 s … 30.386 s    10 runs

Summary
  'openapi-spec-validator==0.5.7 schema.yaml' ran
   36.23 ± 9.65 times faster than 'openapi-spec-validator==0.6.0 schema.yaml'

So 0.5.7 took roughly 700 ms to validate the schema and 0.6.0 took 25 s. Our real-world schema is bigger and more complex and openapi-spec-validator 0.6.0 is too slow to use in practice, but 0.5.7 handled it just fine.

Speculation on the cause: I haven't dug into it in detail so this might be off! But this is my guess: one of the deps, jsonschema-spec was updated. It looks to me that it keeps re-reading and re-parsing the same file path.yaml every time it encounters a reference even though it could be cached. The older version of jsonschema-spec used different resolver implementation - maybe it had caching?


Python 3.11.4, macOS 13.5

p1c2u commented 11 months ago

Hi @miikka

thanks for the report and the working example. I profiled the example and see where's the issue. Indeed it's about not storing resolved registry in jsonschema-spec The fix should be simple.

p1c2u commented 11 months ago

The issue was fixed with https://github.com/p1c2u/jsonschema-spec/releases/tag/0.2.4 hence closing.