python-jsonschema / check-jsonschema

A CLI and set of pre-commit hooks for jsonschema validation with built-in support for GitHub Workflows, Renovate, Azure Pipelines, and more!
https://check-jsonschema.readthedocs.io/en/stable
Other
191 stars 39 forks source link

Remote file caching improvement is destroyed by use of parse as a validation callback #453

Closed sirosen closed 3 days ago

sirosen commented 3 days ago

I realized this just now in the course of discussing #451. The validation callback parses response data, which results in a download.

The fix will require re-thinking how retries are managed within the CacheDownloader. It also reveals a gap between unit and acceptance testing, but I'll not worry about that at the moment.

sirosen commented 3 days ago

The right way to do this is to take the retries out of the streaming request helper and to put them into a broader context. Basically, the validation callback should only fire if we have a cache miss. And if the validation callback fails, we start doing retries in a tight loop with the validation callback in order to avoid poisoning the cache with unparseable data.