voxpupuli / json-schema

Ruby JSON Schema Validator
MIT License
1.54k stars 243 forks source link

Speed up JSON::Validator.validate #255

Closed mjc closed 8 years ago

mjc commented 9 years ago

Addressible's URI parsing shows up at the top of the profiler for our use case on JRuby, as well as YAML dumping.

This patch

  1. adds a cache to JSON::Util::URI.normalized_uri
  2. adds a cache for Addressable::URI.parse in JSON::Util::URI.parse
  3. prevents clearing an already cleared URI fragment in JSON::Schema#initialize
  4. uses Hash#hash for calculating uuids instead of serializing the hash.

There are many further gains to be had still, especially if we can find a way to remove the mutex around initializing a schema, as that is proving to be terrible for performance on JRuby in a multi-threaded environment.

I'm pretty much finished with this pass - almost 8x is a pretty nice gain.

benchmark: https://github.com/mjc/json-schema-perf

MRI 2.2.2 results: 2.89x faster from file, 5.55x faster from hash.

JRuby 1.7.20 results: 2.38x faster from file, 7.86x faster from hash.

MRI 2.2.2 before

Calculating -------------------------------------
json-schema from file
                       181.000  i/100ms
json-schema from hash
                        87.000  i/100ms
-------------------------------------------------
json-schema from file
                          1.962k (± 6.3%) i/s -     58.644k
json-schema from hash
                        938.211  (± 6.2%) i/s -     28.101k

Comparison:
json-schema from file:     1961.8 i/s
json-schema from hash:      938.2 i/s - 2.09x slower

MRI 2.2.2 after

json-schema from file
                          5.668k (± 7.4%) i/s -    169.162k
json-schema from hash
                          5.210k (± 7.1%) i/s -    155.696k

Comparison:
json-schema from file:     5668.0 i/s
json-schema from hash:     5210.2 i/s - 1.09x slower

JRuby 1.7.20 before

Calculating -------------------------------------
json-schema from file
                       200.000  i/100ms
json-schema from hash
                       138.000  i/100ms
-------------------------------------------------
json-schema from file
                          2.075k (± 6.1%) i/s -     62.000k
json-schema from hash
                          1.401k (± 5.4%) i/s -     41.952k

Comparison:
json-schema from file:     2074.9 i/s
json-schema from hash:     1401.0 i/s - 1.48x slower

JRuby 1.7.20 after

Calculating -------------------------------------
json-schema from file
                       474.000  i/100ms
json-schema from hash
                         1.002k i/100ms
-------------------------------------------------
json-schema from file
                          4.942k (± 5.7%) i/s -    147.888k
json-schema from hash
                         10.773k (± 6.5%) i/s -    322.644k

Comparison:
json-schema from hash:    10773.1 i/s
json-schema from file:     4942.2 i/s - 2.18x slower
mjc commented 9 years ago

I also need to research finding ways to test concurrency and remove the mutex around schema initialization safely.

iainbeeston commented 9 years ago

With regards to the mutex, I'm not even sure what it's purpose is. So far as I can tell it ensures that only one validator can be initialised at once. But it's not protecting access to any class variables, and it's not protecting a validator from being used by several threads concurrently and getting into an invalid state.

It's possible the answer is in the git history, though I'm not in front of a computer right now.

@hoxworth Do you remember why the mutex was introduced?

mjc commented 9 years ago

I'm pretty much done with this pass if we can't remove the mutex.

RST-J commented 9 years ago

To me that looks OK and I guess using some more memory is a reasonable trade for increased performance.

iainbeeston commented 9 years ago

I've added a couple more questions, but overall this looks great and I'll be happy to merge once I hear back from @mjc. I especially appreciate the benchmark times that accompany the changes (thanks @mjc!)

mjc commented 9 years ago

@iainbeeston no problem! Happy to help. Trying to improve performance if you don't know where the bottlenecks are and whether you made a useful difference is silly.

mjc commented 9 years ago

Weird, I think maybe the .parse cache just has to go. :(

iainbeeston commented 9 years ago

So it works when the key is a uri object but not when it is a string?

mjc commented 9 years ago

It looks like the objects get modified in incompatible ways sometimes? The error on Travis is proving hard to track down

On Thu, Jun 4, 2015 at 3:19 PM Iain Beeston notifications@github.com wrote:

So it works when the key is a uri object but not when it is a string?

— Reply to this email directly or view it on GitHub https://github.com/ruby-json-schema/json-schema/pull/255#issuecomment-109055101 .

mjc commented 9 years ago

Do you think it'd be possible to use the stdlib's URI stuff? http://ruby-doc.org/stdlib-2.2.2/libdoc/uri/rdoc/URI.html On Thu, Jun 4, 2015 at 7:36 PM Michael Cohen michael.joseph.cohen@gmail.com wrote:

It looks like the objects get modified in incompatible ways sometimes? The error on Travis is proving hard to track down

On Thu, Jun 4, 2015 at 3:19 PM Iain Beeston notifications@github.com wrote:

So it works when the key is a uri object but not when it is a string?

— Reply to this email directly or view it on GitHub https://github.com/ruby-json-schema/json-schema/pull/255#issuecomment-109055101 .

iainbeeston commented 9 years ago

Is it possible to freeze the URI object? Would that help?

The gem used to do that but although it's faster it has trouble with unusual URIs (file URIs I think are an example of that, but generally addressable is more capable) On Fri, 5 Jun 2015 at 2:37 am, Michael J. Cohen notifications@github.com wrote:

Do you think it'd be possible to use the stdlib's URI stuff? http://ruby-doc.org/stdlib-2.2.2/libdoc/uri/rdoc/URI.html On Thu, Jun 4, 2015 at 7:36 PM Michael Cohen < michael.joseph.cohen@gmail.com> wrote:

It looks like the objects get modified in incompatible ways sometimes? The error on Travis is proving hard to track down

On Thu, Jun 4, 2015 at 3:19 PM Iain Beeston notifications@github.com wrote:

So it works when the key is a uri object but not when it is a string?

— Reply to this email directly or view it on GitHub < https://github.com/ruby-json-schema/json-schema/pull/255#issuecomment-109055101

.

— Reply to this email directly or view it on GitHub https://github.com/ruby-json-schema/json-schema/pull/255#issuecomment-109121084 .

iainbeeston commented 8 years ago

@RST-J and @mjc Could you please take a look at #285 , which is my attempt to salvage this and make it releasable?

iainbeeston commented 8 years ago

I'm closing this for now. I've merged most of the changes with the exception of using has to generate uuids (which gave a good speed improvement but feels risky to me)