Closed bkuhlmann closed 10 months ago
In order to consider this a defect worth fixing (and possibly causing a backwards incompatible change to the API), there would need to be more-than-theoretical / real world data from YARD usage. As shown later in this post, I cannot reproduce such a slowdown. It's worth noting that the slow OpenStruct behavior is in first-write on an attribute in newly-allocated struct objects. Below is a much more comprehensive benchmark of the behavior, much more complete than the benchmarks linked above and illustrates the real issue.
In practice, YARD doesn't really go through this "slow" code path often because most OpenStruct objects long-lived and shared, so first-write of attributes is fairly rare (we use structs to store shared state while parsing entire source trees creating maybe only a handful of structs). There are a few edge cases here-- if you use a lot of macros / directives, for example, you could see an impact in performance.
That said, changing this would be a breaking change to YARD's API, since OpenStructs are part of the public extension API in order to allow developers to provide custom context while parsing. It is not possible to exchange OpenStruct with Struct/Data-- they simply do not do the same thing. In order to provide similar (but not exactly equal) support to OpenStruct, we would need to implement a custom OpenStruct class that basically does the same thing as OpenStruct itself. I wrote up a quick example of this as YARD::OpenStruct
and added it to YARD's codebase (with passing tests) and the change was not detectable in a full run of yard
on YARD's own directory structure:
# CURRENT BEHAVIOR
$ time ruby bin/yard
Files: 186
Modules: 49 ( 6 undocumented)
Classes: 209 ( 41 undocumented)
Constants: 98 ( 29 undocumented)
Attributes: 179 ( 0 undocumented)
Methods: 997 ( 136 undocumented)
86.16% documented
real 0m15.722s
user 0m0.015s
sys 0m0.000s
# AFTER YARD::OpenStruct patch:
$ time ruby bin/yard
Files: 186
Modules: 49 ( 6 undocumented)
Classes: 209 ( 41 undocumented)
Constants: 98 ( 29 undocumented)
Attributes: 179 ( 0 undocumented)
Methods: 997 ( 136 undocumented)
86.16% documented
real 0m15.653s
user 0m0.000s
sys 0m0.000s
Note that the above is just one sample, the averages range across the board and the tl;dr is that this change would not improve performance in any noticeable way. For reference, the YARD::OpenStruct implementation is provided below with benchmarks to compare to OpenStruct/Struct:
As you can see, the naive implementation I threw together in a few minutes fixes the fundamental allocation speed issue-- but it changes the API. YARD extension authors who were relying on various OpenStruct methods would start seeing failures, and in order to make this backward compatible, YARD would be on the hook for implementing a drop-in replacement to OpenStruct just to fix an allocation performance issue which could have been patched in OpenStruct itself.
This gets to the heart of the issue: the idea that OpenStruct "is slow" seems like an entirely bogus claim based on bad data. I could be off base here since I haven't investigated the current OpenStruct implementation too closely, but I was able to throw together a reasonably performant replacement in a few minutes, it seems entirely probable that OpenStruct's own allocation issue could have been fixed instead of providing the warning that the Ruby core team chose to add.
I would say that this should be marked as an upstream defect. Why would we just fix this in YARD when the Ruby team could be addressing the performance issue and fixing it for everybody? Especially given that in practice, this does not affect actual performance in YARD, but likely does in other Ruby applications.
Thanks and apologies for not clarifying the kind of benchmarks I used (I have fixed this and will be publishing additional details in the next release of my Benchmarks project). Regardless, the difference in what I reported and what you reported has to do with YJIT enabled or disabled. Using a modified version of what you originally published, the following highlights the difference:
The key differences between the two benchmarks are:
--yjit-disable
) in the second benchmark.Struct
is not subclassed since the Ruby core team does not promote this usage since it's not performant (more here).OpenStruct.new
(blank) performance is terrible with YJIT enabled (i.e. 16.681001)
) but is 0.014673
when YJIT is disabled.In practice, YARD doesn't really go through this "slow" code path often because most OpenStruct objects long-lived and shared, so first-write of attributes is fairly rare
Fair. Yet, the Ruby core team recommends not using OpenStruct
at all and with the new performance warning category in Ruby 3.3.0 they are making this even more apparent by raising these performance warnings now. I never use OpenStruct
in production code for this very reason because it's too easy to forget this fact. That said, there a a couple potential alternatives:
Hash
: While not having the same Object API as OpenStruct
-- and at the cost of primitive obsession -- you could provide a highly performant alternative.OpenStruct
.Why would we just fix this in YARD when the Ruby team could be addressing the performance issue and fixing it for everybody?
Fair except the Ruby core team hasn't been promoting the use of OpenStruct
for multiple years and with the release of Ruby 3.3.0 performance warnings, they are getting more aggressive about this. I don't seen the Ruby core team addressing or wanting to fix this since they are trying to steer folks away from OpenStruct
usage.
Anyway, these are some thoughts and wanted to bring this to your attention for consideration. :bow:
I cannot reproduce the same results with blank OpenStruct.new allocation in Ruby 3.2.2:
▶ bundle exec ruby --yjit bench.rb
ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [arm64-darwin22]
...
user system total real
OpenStruct.new(args) 3.331943 33.654262 36.986205 ( 37.526594)
Struct.new(args) 0.012822 0.000427 0.013249 ( 0.013376)
OpenStruct.new (blank) 0.011470 0.000457 0.011927 ( 0.012022)
OpenStruct (assign) 0.009909 0.000621 0.010530 ( 0.010548)
Struct (assign) 0.003504 0.000481 0.003985 ( 0.004172)
OpenStruct (read) 0.007868 0.000574 0.008442 ( 0.008476)
Struct (read) 0.002935 0.000368 0.003303 ( 0.003367)
Seems to me as though you've found a regression in YJIT that should be reported to the Ruby team. The fact that an arg-less call to OpenStruct.new does absolutely nothing magical and performs as bad as you've shown should be a big hint that the issue is not with OpenStruct, but YJIT itself.
This is confirmed by testing the benchmark via JRuby (with JIT) which performs absolutely normally here, indicating that there's nothing inherent to OpenStruct's implementation that should be incompatible with JIT, and, frankly, nothing besides initial dynamic access should be slow with or without JIT.
▶ jruby bench.rb
jruby 9.4.5.0 (3.1.4) 2023-11-02 1abae2700f OpenJDK 64-Bit Server VM 21.0.1 on 21.0.1 +jit [arm64-darwin]
Rehearsal ----------------------------------------------------------
OpenStruct.new(args) 2.200000 0.040000 2.240000 ( 0.719111)
Struct.new(args) 0.110000 0.010000 0.120000 ( 0.034137)
OpenStruct.new (blank) 0.050000 0.000000 0.050000 ( 0.017374)
OpenStruct (assign) 0.100000 0.000000 0.100000 ( 0.031827)
Struct (assign) 0.020000 0.000000 0.020000 ( 0.006671)
OpenStruct (read) 0.060000 0.000000 0.060000 ( 0.018253)
Struct (read) 0.020000 0.000000 0.020000 ( 0.005145)
------------------------------------------------- total: 2.610000sec
user system total real
OpenStruct.new(args) 0.650000 0.020000 0.670000 ( 0.351861)
Struct.new(args) 0.130000 0.000000 0.130000 ( 0.044659)
OpenStruct.new (blank) 0.020000 0.000000 0.020000 ( 0.008069)
OpenStruct (assign) 0.020000 0.010000 0.030000 ( 0.007613)
Struct (assign) 0.020000 0.000000 0.020000 ( 0.005670)
OpenStruct (read) 0.020000 0.000000 0.020000 ( 0.007196)
Struct (read) 0.010000 0.000000 0.010000 ( 0.001456)
JRuby's JIT can handle this library just fine. Looks like YJIT has a long way to go.
Going to mark this as closed since any changes would likely create backward incompatible API changes to YARD, and the usage patterns YARD uses with OpenStruct should not cause any serious performance concerns unless you are using a defective version of YJIT.
Hello. :wave: I'm seeing performance warnings show up when using YARD in Ruby 3.3.0 due to heavy use of
OpenStruct
which the Ruby core team no longer recommends being used.Steps to reproduce
You'll want to ensure that performance warnings are enabled. Example:
With the above in place, now you can recreate using either of the following:
`yard doc --no-private`
``` /Users/bkuhlmann/.cache/frum/versions/3.3.0/lib/ruby/gems/3.3.0/gems/yard-0.9.34/lib/yard/templates/template_options.rb:34: warning: OpenStruct use is discouraged for performance reasons /Users/bkuhlmann/.cache/frum/versions/3.3.0/lib/ruby/gems/3.3.0/gems/yard-0.9.34/lib/yard/parser/source_parser.rb:365: warning: OpenStruct use is discouraged for performance reasons /Users/bkuhlmann/.cache/frum/versions/3.3.0/lib/ruby/gems/3.3.0/gems/yard-0.9.34/lib/yard/templates/template_options.rb:34: warning: OpenStruct use is discouraged for performance reasons Files: 0 Modules: 0 ( 0 undocumented) Classes: 0 ( 0 undocumented) Constants: 0 ( 0 undocumented) Attributes: 0 ( 0 undocumented) Methods: 0 ( 0 undocumented) 100.00% documented ``` Notice the performance warnings shown above.`gem install faker`
``` Successfully installed faker-3.2.2 Building YARD (yri) index for faker-3.2.2... /Users/bkuhlmann/.cache/frum/versions/3.3.0/lib/ruby/gems/3.3.0/gems/yard-0.9.34/lib/yard/templates/template_options.rb:34: warning: OpenStruct use is discouraged for performance reasons /Users/bkuhlmann/.cache/frum/versions/3.3.0/lib/ruby/gems/3.3.0/gems/yard-0.9.34/lib/yard/parser/source_parser.rb:365: warning: OpenStruct use is discouraged for performance reasons /Users/bkuhlmann/.cache/frum/versions/3.3.0/lib/ruby/gems/3.3.0/gems/yard-0.9.34/lib/yard/parser/source_parser.rb:365: warning: OpenStruct use is discouraged for performance reasons /Users/bkuhlmann/.cache/frum/versions/3.3.0/lib/ruby/gems/3.3.0/gems/yard-0.9.34/lib/yard/handlers/processor.rb:101: warning: OpenStruct use is discouraged for performance reasons /Users/bkuhlmann/.cache/frum/versions/3.3.0/lib/ruby/gems/3.3.0/gems/yard-0.9.34/lib/yard/docstring_parser.rb:90: warning: OpenStruct use is discouraged for performance reasons ``` Notice the performance warnings displayed when installing the gem. What's shown above is only a small example, this actually repeats for several pages.Actual Output
See above.
Expected Output
I would expect to see no performance warnings due to the removal of
OpenStruct
since it has terrible performance. If you need benchmarks, check out howOpenStruct
performs ~4,000 times worse than aStruct
orData
object in these benchmarks.Environment details: