voxpupuli / json-schema

Ruby JSON Schema Validator
MIT License
1.52k stars 242 forks source link

Made sure clear_cache also clears the cache of parsed uris #361

Closed iainbeeston closed 7 years ago

iainbeeston commented 7 years ago

There are two main caches in json-schema. One is for schemas, one is for parsed uris. Previously, setting :clear_cache as a validation option would clear the schema cache but not the uri cache. This has been reported to cause a memory leak (see issue #329) as over time the uri cache gets bigger and bigger and is never cleared.

This change makes sure that whenever the schema cache is cleared, the uri cache is also cleared.

Incidentally this change is not thread safe, and multithreaded applications that clear the cache may see issues. But until we can refactor to either disable caching or get rid of class variables, I don't believe there's much we can (realistically) do about that. This is still an improvement on what's gone before.

RST-J commented 7 years ago

My brain sometimes tickles me with the idea of a major restructuring of json-schema. Basically to extract functionality according to the separation of concerns principle. The validator base class does really, really much and I still have not the feeling to be at home there (which I consider a bad sign). I have no clear idea whether and how to do that. I also haven't spent too much thoughts about it as this only makes sense, if we a) agree this is a good idea and b) then agree on how this then considered good idea should be carried out.

iainbeeston commented 7 years ago

Yes I agree. I can see a way forward though by extracting out functionality one object at a time. There is clearly an object for URI parsing, and one for loading data from a file or uri. We should not be using class variables, but instead keep a set of objects internal to each validator object, and allow the user to reuse validator objects if they want to cache previously used data.

Another option that I have considered is extracting out some of the functionality into separate gems. For example, a gem could be made that just evaluates json references. Json parsing is already handled by the json gem. The json schema spec was split into three parts for draft 4 and these could possibly be represented by 3 gems? (Though I'm not sure if that would work?)

Regarding this PR, I think we should possibly have a single object that handles URI parsing, and (for now) keep an instance of that object in a class variable in Validator. That object could keep the URI caches internal to itself. This would be similar to the Reader objects we already have. Although I would argue that what is already in this pull request is good enough for now and we can improve the object design in a new pull request.

iainbeeston commented 7 years ago

Oh, and another thing - I've been trying hard (with this and other pull requests) to improve the reliability and coverage of the test suite, so that if we do choose to refactor we can be more confident that nothing will break. Although we have a lot of tests some areas do not even have a single test that covers them! In other cases we have tests that only pass if they are run in the correct order!

RST-J commented 7 years ago

Fine. I agree that we should take small steps - one aspect at a time. Actually I think this is the only realistic way to go. And I'm also in favour of improving the test suite itself and extending coverage along the way - wrt testing I'm rather the 'more is better' kind of guy, even more as this is a library.

What three parts are you referring to? I currently do not really have the spec (details) present in my mind - I'm probably kind of asked to much of by the 2nd paragraph in your first comment. ;) In general I'd say we shouldn't make things too complex. Managing two dependencies to achieve one thing - validating json-schema is what users want to do - could introduce additional complexity. But if they make sense on their own, it could also be a good decision, although my gut feeling right now suggests otherwise.

And finally, yes, lets merge this. Its a small step.

iainbeeston commented 7 years ago

Ah the three json schema drafts are the core, validation and hyper schema drafts. I suspect it was split up in draft4 because it was becoming too long...