Closed casperisfine closed 3 weeks ago
This seems a bit unfortunate performance-wise.
I wonder why EscapedString
inherits from String, if it didn't it would just work without special care, no?
Or the escaping could be done eagerly directly in jsonify
.
I think the performance cost is very minor, just an extra pointer comparison on the class, that's tagged as unlikely, so probably predicted out.
I didn't see much change on the benchmarks.
In lib/json/pure/generator.rb generate_json
it seems quite a bit of extra code and checks, which seems likely to hurt dump perf quite a bit.
My assumption is the pure generator is only really used with Truffle or in context where perf was disregarded. Have you measured the perf degradation on Truffle, is it really substantial? I'd expect Truffle to be able to optimize most of that out.
I'll check it out later.
Before this PR (6d3b3ac):
$ ruby benchmark/standalone.rb dump pure
JSON::Pure::Generator
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
Warming up --------------------------------------
JSON.dump(obj) 350.000 i/100ms
JSON.dump(obj) 745.000 i/100ms
JSON.dump(obj) 829.000 i/100ms
JSON.dump(obj) 799.000 i/100ms
JSON.dump(obj) 836.000 i/100ms
Calculating -------------------------------------
JSON.dump(obj) 8.152k (± 9.8%) i/s (122.66 μs/i) - 40.128k in 5.008035s
JSON.dump(obj) 8.166k (± 9.3%) i/s (122.46 μs/i) - 40.964k in 5.084684s
JSON.dump(obj) 8.180k (± 6.6%) i/s (122.26 μs/i) - 40.964k in 5.038216s
JSON.dump(obj) 8.101k (± 9.2%) i/s (123.43 μs/i) - 40.128k in 5.020143s
JSON.dump(obj) 8.156k (± 7.0%) i/s (122.60 μs/i) - 40.964k in 5.056036s
$ ruby benchmark/standalone.rb dump pure
JSON::Pure::Generator
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
Warming up --------------------------------------
JSON.dump(obj) 336.000 i/100ms
JSON.dump(obj) 793.000 i/100ms
JSON.dump(obj) 827.000 i/100ms
JSON.dump(obj) 828.000 i/100ms
JSON.dump(obj) 788.000 i/100ms
Calculating -------------------------------------
JSON.dump(obj) 8.074k (± 8.6%) i/s (123.85 μs/i) - 40.188k in 5.037323s
JSON.dump(obj) 7.993k (± 9.8%) i/s (125.11 μs/i) - 39.400k in 5.004141s
JSON.dump(obj) 8.032k (± 9.8%) i/s (124.51 μs/i) - 40.188k in 5.079140s
JSON.dump(obj) 8.013k (± 8.8%) i/s (124.79 μs/i) - 40.188k in 5.073014s
JSON.dump(obj) 8.098k (± 9.2%) i/s (123.49 μs/i) - 40.188k in 5.038510s
After this PR (96397cf0903d4321263243feb8fb182e4698b6d9):
$ ruby benchmark/standalone.rb dump pure
JSON::Pure::Generator
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
Warming up --------------------------------------
JSON.dump(obj) 363.000 i/100ms
JSON.dump(obj) 557.000 i/100ms
JSON.dump(obj) 647.000 i/100ms
JSON.dump(obj) 605.000 i/100ms
JSON.dump(obj) 580.000 i/100ms
Calculating -------------------------------------
JSON.dump(obj) 6.405k (± 7.1%) i/s (156.14 μs/i) - 31.900k in 5.016689s
JSON.dump(obj) 6.387k (± 9.6%) i/s (156.57 μs/i) - 31.900k in 5.077313s
JSON.dump(obj) 6.437k (± 5.6%) i/s (155.34 μs/i) - 32.480k in 5.067145s
JSON.dump(obj) 6.394k (± 8.5%) i/s (156.40 μs/i) - 31.900k in 5.045133s
JSON.dump(obj) 6.402k (± 7.7%) i/s (156.20 μs/i) - 31.900k in 5.022605s
$ ruby benchmark/standalone.rb dump pure
JSON::Pure::Generator
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
Warming up --------------------------------------
JSON.dump(obj) 311.000 i/100ms
JSON.dump(obj) 557.000 i/100ms
JSON.dump(obj) 617.000 i/100ms
JSON.dump(obj) 579.000 i/100ms
JSON.dump(obj) 589.000 i/100ms
Calculating -------------------------------------
JSON.dump(obj) 6.444k (± 7.8%) i/s (155.18 μs/i) - 32.395k in 5.069218s
JSON.dump(obj) 6.436k (± 7.9%) i/s (155.37 μs/i) - 32.395k in 5.073131s
JSON.dump(obj) 6.521k (± 6.3%) i/s (153.36 μs/i) - 32.984k in 5.083403s
JSON.dump(obj) 6.534k (± 5.7%) i/s (153.04 μs/i) - 32.984k in 5.071127s
JSON.dump(obj) 6.483k (±10.8%) i/s (154.24 μs/i) - 31.806k in 5.023239s
I tried to undo the changes incrementally and the main slowdown seems from using:
klass = obj.class
if klass == Hash
elsif klass == Array
elsif klass == String
...
(I also tried to reverse that == check but it didn't seem to change perf much, the reasoning is klass->klass
is polymorphic, it's different singleton classes)
vs
case obj
when Hash
when Array
when String
...
I wonder if maybe it's simply obj.class
which is expensive, because it's called on objects of various classes and the method lookup needs to get the obj->klass and caches on that, but that will be a polymorphic inline cache.
OTOH Module#===
as in when Hash
doesn't have that polymorphism, because it's called on a constant module (there are still some branches to handle primitives like int/boolean/double
, but for all non-primitives it's just a field read).
And how fast is the C implementation on Truffle?
Because on my machine:
$ ruby --yjit -Ilib:ext benchmark/standalone.rb dump
JSON::Ext::Generator
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
JSON.dump(obj) 3.295k i/100ms
Calculating -------------------------------------
JSON.dump(obj) 33.306k (± 0.6%) i/s (30.02 μs/i) - 168.045k in 5.045675s
So maybe Truffle should just revert back to use the C extension?
Unfortunately the C extension is still much slower, needs some investigation & profiling: (using https://github.com/ruby/json/compare/master...eregon:json:truffleruby-use-generator-cext)
$ ruby --experimental-options --cexts-panama -Ilib:ext benchmark/standalone.rb dump ext
JSON::Ext::Generator
truffleruby 24.2.0-dev-07b978e4, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
Warming up --------------------------------------
JSON.dump(obj) 46.000 i/100ms
JSON.dump(obj) 55.000 i/100ms
JSON.dump(obj) 56.000 i/100ms
JSON.dump(obj) 56.000 i/100ms
JSON.dump(obj) 55.000 i/100ms
Calculating -------------------------------------
JSON.dump(obj) 562.849 (± 1.1%) i/s (1.78 ms/i) - 2.860k in 5.081942s
JSON.dump(obj) 562.589 (± 0.5%) i/s (1.78 ms/i) - 2.860k in 5.083810s
JSON.dump(obj) 562.587 (± 0.7%) i/s (1.78 ms/i) - 2.860k in 5.083864s
JSON.dump(obj) 563.702 (± 0.5%) i/s (1.77 ms/i) - 2.860k in 5.073772s
JSON.dump(obj) 561.796 (± 1.4%) i/s (1.78 ms/i) - 2.860k in 5.091997s
Compared to after https://github.com/ruby/json/pull/674:
$ ruby -Ilib:ext benchmark/standalone.rb dump pure
JSON::Pure::Generator
truffleruby 24.2.0-dev-07b978e4, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
Calculating -------------------------------------
JSON.dump(obj) 8.237k (± 5.0%) i/s (121.41 μs/i) - 41.600k in 5.069507s
JSON.dump(obj) 8.179k (± 5.1%) i/s (122.26 μs/i) - 40.768k in 5.002035s
JSON.dump(obj) 8.147k (± 7.9%) i/s (122.74 μs/i) - 40.768k in 5.044840s
JSON.dump(obj) 8.137k (± 6.9%) i/s (122.90 μs/i) - 40.768k in 5.048690s
JSON.dump(obj) 8.112k (±10.2%) i/s (123.27 μs/i) - 39.936k in 5.023502s
To have a baseline:
$ ruby --yjit -Ilib:ext benchmark/standalone.rb dump
JSON::Ext::Generator
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) +YJIT [x86_64-linux]
Warming up --------------------------------------
JSON.dump(obj) 1.918k i/100ms
Calculating -------------------------------------
JSON.dump(obj) 19.110k (± 1.1%) i/s (52.33 μs/i) - 95.900k in 5.018895s
Fix: https://github.com/ruby/json/issues/667
This is yet another behavior on which the various implementations differed, but the C implementation used to call
to_json
on String subclasses used as keys.This was optimized out in e125072130229e54a651f7b11d7d5a782ae7fb65 but there is an Active Support test case for it, so it's best to make all 3 implementation respect this behavior.
FYI: @mtasaka