ruby / psych

A libyaml wrapper for Ruby
MIT License
566 stars 206 forks source link

Eagerly require `date` #695

Closed tdeo closed 3 days ago

tdeo commented 1 week ago

Fixes https://github.com/ruby/psych/issues/694.

When picking up only the test changes (and not require 'date' explicitly anymore), such failures start to appear:

Error: test_spec_sequence_key_shortcut(Psych_Unit_Tests): NameError: uninitialized constant Psych_Unit_Tests::Date
/Users/thierry/workspace_personal/psych/test/psych/test_yaml.rb:299:in `test_spec_sequence_key_shortcut'
     296:     def test_spec_sequence_key_shortcut
     297:         # Shortcut sequence map
     298:         assert_parse_only(
  => 299:           { 'invoice' => 34843, 'date' => Date.new( 2001, 1, 23 ),
     300:             'bill-to' => 'Chris Dumars', 'product' =>
     301:             [ { 'item' => 'Super Hoop', 'quantity' => 1 },
     302:               { 'item' => 'Basketball', 'quantity' => 4 },
=======================================================================================================================
Finished in 0.390079 seconds.
-----------------------------------------------------------------------------------------------------------------------
609 tests, 1481 assertions, 0 failures, 21 errors, 0 pendings, 0 omissions, 0 notifications
96.5517% passed

The autoload :Date statement gets moved to a more appropriate file (the one actually loading the constant), and moved to the top-level so it requires ::Date and not try to require Psych::ClassLoader::Date

Performance-wise, reusing the benchmark from https://github.com/ruby/psych/commit/06db36fa3a849f229e2cbc7b495d142ff58a1f89#diff-6a459e056cadf37665f54005bd2dde09d9ba8e66c9807eb0dc67145f9b841771R7, it seems to further improve performance quite unsignificantly:

With those changes:

$ ruby /tmp/bench-yaml.rb
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [arm64-darwin23]
Warming up --------------------------------------
           100 dates   464.000 i/100ms
Calculating -------------------------------------
           100 dates      4.605k (± 2.3%) i/s  (217.18 μs/i) -     23.200k in   5.041382s

On current master:

$ ruby /tmp/bench-yaml.rb
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [arm64-darwin23]
Warming up --------------------------------------
           100 dates   439.000 i/100ms
Calculating -------------------------------------
           100 dates      4.515k (± 1.9%) i/s  (221.49 μs/i) -     22.828k in   5.058050s
tdeo commented 4 days ago

I don't see any test, so it's hard to confirm it's indeed the cause.

I tried to write some tests but unfortunately Kernel.remove_const(:Date) is impossible because Date is a built-in. I meant in the PR description that if you try to run the test suite without the autoload statement, you'll get a failure. You would also get failures if that statement is within Psych::ScalarScanner, in which case it's trying to autoload Psych::ScalarScanner::Date instead of ::Date

byroot commented 4 days ago

You can probably use a subprocess.

tdeo commented 4 days ago

You can probably use a subprocess.

I was able indeed to add a test case with this technique, thanks for the suggestion

byroot commented 4 days ago

There's a test helper for that: assert_separatly.

But no worries, I'll take care of it. I need to run some errand but I'll fix this in a couple hours.

Also I think we should just stop lazy requiring libraries like that, it's stupid.

byroot commented 4 days ago

I updated your PR.

cc @tenderlove what do you think?

tdeo commented 4 days ago

Thanks for wrapping it up

byroot commented 3 days ago

Right, sorry Aaron, I should have explained.

In YAMLTree#visit_String, there's this condition to figure if a string should be quoted:

        elsif not String === @ss.tokenize(o) or /\A0[0-7]*[89]/.match?(o)
          style = Nodes::Scalar::SINGLE_QUOTED

So we end up in:

class_loader.date.strptime(string, '%F', Date::GREGORIAN)

Which further end up trying ClassLoader#resolve of "Date", and this uses path2class, so it's resolved as essentially ::Date, so the scoped autoload I introduced isn't triggered in this case.

Hence why the initial version of this PR was to move the autoload at the top level, but at that point I think we should just eager load.

tenderlove commented 3 days ago

@byroot ah, gotcha. I somewhat worry this might have downstream effects (maybe on Bundler or RubyGems). But I like the change so I'll merge it.