Open nathan-stender opened 2 months ago
Hey there, I'm happy to have a look at this at some point, but is there a reason you're benchmarking against such an old version? Lots has changed since 4.18, so it'd be good if you shared numbers which were on 4.23.
Sorry, I didn't mention that I tested on every version between 4.18 and 4.23 to see if any had better performance. None of the versions past 4.18 improve the performance noticeably.
On 4.23, the results are actually a bit worse:
For the single test: 55s For the 26 tests: 6m33s
We have also experienced similar performance issue in one of our tool after switching from RefResolver to this library. This is the commit in our library: https://github.com/PolusAI/workflow-inference-compiler/pull/287
Hello!
I have a library that formats scientific data into a JSON schema called the
Allotrope Standard Model
(ASM)The validation schemas are fairly large and complicated compared to other schemas I've seen in discussion boards, and are very modular, meaning there are a lot of references. In
allotropy
we store the ASM schemas directly, and remove all remote references, replacing them with local references under$defs
.We are finding that validating against the schemas using
jsonschema
version4.18.0
takes ~20x longer than4.17.0
.As a concrete example:
Validating this data: https://raw.githubusercontent.com/Benchling-Open-Source/allotropy/refs/heads/main/tests/parsers/moldev_softmax_pro/testdata/MD_SMP_luminescence_endpoint_example08.json
Against this schema: https://github.com/Benchling-Open-Source/allotropy/blob/main/src/allotropy/allotrope/schemas/adm/plate-reader/REC/2024/06/plate-reader.schema.json
takes
~3.5s
on4.17.0
and~55s
on4.18.0
This translates to a runtime for all 26 tests in
tests/parsers/moldev_softmax_pro
of~30s
in4.17.0
to~6m
in4.18.0