python-jsonschema / check-jsonschema

A CLI and set of pre-commit hooks for jsonschema validation with built-in support for GitHub Workflows, Renovate, Azure Pipelines, and more!
https://check-jsonschema.readthedocs.io/en/stable
Other
191 stars 39 forks source link

Design a solution for caching downloads of `$refs` in order to improve performance in cases with many remote refs #452

Open sirosen opened 3 days ago

sirosen commented 3 days ago

Original use-case sourced from this PR: #451

The current caching capability significantly improves runtimes for remote schemas when there is a single remote file to download, but does nothing to improve the case where there are refs to resolve. Refs are cached in-memory by referencing, but discarded between runs.

For faster runs, check-jsonschema should cache resolved refs on disk as well.

Some basic requirements:

[!NOTE] A friend of mine suggested putting cache data into a DB (e.g. sqlite) when we talked about this, so that it could be annotated with richer metadata and structure. Although that might be a good idea longer term, I don't want to reach for that quite yet -- I think this can be solved with a good dir structure for now.

Here's one initial idea, for evaluation: