oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
3.02k stars 185 forks source link

New test failure in the Zeitwerk test suite #2431

Open fxn opened 3 years ago

fxn commented 3 years ago

Hi!

The suite of Zeitwerk has this test to check autoloading blocks context switching (autoloading is thread-safe):

test "constant definition is synchronized" do
  files = [["m.rb", <<-EOS]]
    module M
      sleep 0.5

      def self.works?
        true
      end
    end
  EOS
  with_setup(files) do
    t = Thread.new { M }
    assert M.works?
    t.join
  end
end

It has passed regularly for a long time, but it started to fail in a flaky way both in truffleruby and in truffleruby-head. See for example this build, or this build.

Does it ring a bell?

fxn commented 3 years ago

This is a shell script that essentially does the same as the flaky test:

#!/bin/sh

cat <<EOS > m.rb
module M
  sleep 0.5

  def self.works?
    true
  end
end
EOS

ruby -I. <<EOS
autoload :M, "m"
t = Thread.new { M }
p M.works?
EOS

rm m.rb

If it passes, you see "true" printed. Otherwise, M does not respond to works? (NoMethodError) because there was a context switch before the require triggered by autoload returned.

fxn commented 3 years ago

With CRuby, that passes even with sleep 60.

eregon commented 3 years ago

Thanks a lot for the report. This is an area where the semantics of CRuby seems unclear, e.g. does it make other threads wait until the end of module M or the end of the file until the constant M is "published"? What happens if the autoloaded file creates threads, do they see M?

In short this is not yet implemented in TruffleRuby (IIRC), because of unclear semantics. Currently there is a lock on who wins to autoload a constant, but currently no lock for "wait some other thread has finished autoloading", since the constant is published as soon as assigned. Preventing thread switching itself seems difficult on JVM, and would feel like a hack at best + potential for deadlocks. I guess we should look at what CRuby does here but the autoload logic in CRuby seems particularly unreadable.

Re Zeitwerk CI, could you exclude this test for TruffleRuby and keep running other tests? If it passed before I would think it was just lucky timings.

eregon commented 3 years ago

BTW to improve the reliability of that test I believe you would need a sleep (smaller, e.g. of 0.1) between the Thread.new and assert M.works?, otherwise it's likely the M.works? runs first and then the other thread will see the constant is autoloading and wait, and anyway that thread does not check what is defined on M. For the test to fail it needs to be the Thread.new running first and defining the constant but not yet the method, before the main thread keeps running and call the method.

fxn commented 3 years ago

Problem is, this is a fundamental property of Zeitwerk, which is a fundamental property of CRuby: You won't see a class half-defined due to context switching while autoloading. Meaning, what the test does (of course, a class may be reopened elsewhere later, not in a generic sense).

BTW to improve the reliability of that test I believe you would need a sleep (smaller, e.g. of 0.1) between the Thread.new and assert M.works?

You're right.

fxn commented 3 years ago

Test revised here: https://github.com/fxn/zeitwerk/commit/016404cc94c9940e31cf057955d7ecdd43f70f54. Thanks for the comment about that :).

fxn commented 3 years ago

Hey! I have documented this in the README. I believe it is accurate, but let me know if it should word it in a different way :).

eregon commented 3 years ago

That looks accurate, I'll try to fix this issue soon.

eregon commented 2 years ago

Some more details about how CRuby handles this: https://github.com/ruby/ruby/pull/5788 And: (from @ioquatix)

No, it's literally cached, there is an autoload cache Instead of setting the constant on the module, it's stashed into an autoload save area once the autoload is done, all the constants are added to the module and all waiting threads are unblocked

ioquatix commented 2 years ago

I'm slowly figuring out how it works, I guess it's not a bad implementation, but I think variable.c could do with a rewrite. lol. https://github.com/ruby/ruby/pull/5898 fyi

eregon commented 2 years ago

FWIW, I found https://bugs.ruby-lang.org/issues/15663#change-77214 where I asked some autoload questions and which makes it somewhat clear it's so complex nobody understands it fully. I think it shouldn't be too bad to implement in TruffleRuby because we already have a lock per autoload constant. But potentially deadlocks is an issue to consider, i.e., what if one thread is loading A and that needs B, and another thread is loading B and that needs A? Deadlock or the "isolation" of autoload is broken in that case?

fxn commented 2 years ago

I am not familiar with the implementation, but isn't require already synchronized to avoid seeing partially evaluated files elsewhere? Why aren't these autoloads inheriting or relying on this property?

eregon commented 2 years ago

@fxn require only synchronizes per file, i.e., to guarantee a file is only loaded once (that's easy enough). That doesn't prevent that e.g. class C defined in c.rb is seen partially defined (e.g. some methods defined, some not, only part of c.rb was evaluated) by file use.rb being loaded in another thread and e.g. doing C.new.foo -> potential NoMethodError.

autoload makes it possible to know some of those dependencies, but nested autoloads cannot be known in advance. I think I'll need to write a test script to figure what CRuby does there and if/how it deals with those potential deadlocks.

The general strategy for autoload seems to not publish a constant's value while the file is being loaded (so other threads cannot see it's set and continue executing, they have to block and wait), and instead the autoload logic stashes the value somewhere so only that thread can see it, and then publish the constant at once when the file has been loaded. Not sure about other constants defined in the file, I'd think there is no special behavior there.

fxn commented 2 years ago

@eregon But if user.rb uses C, it should require 'c' before at some point in the code path, no?

fxn commented 2 years ago

Something that could be relevant in @ioquatix foobar.rb example (just a remark, I don't know from the top of my head) is the constant lookup that happens in module Foo; end may trigger the autoload for Foo, depending on load order. Same for the constant lookup that happens in module Bar; end. So, one of them is trying to require a file that is being required by the other one simultaneously.

fxn commented 2 years ago

So, let me repeat the question.

The example sets autoloads for two constants, pointing to the same file. Since that is synchronized, why aren't the autoloads?

I'd expect the second one to trigger a require that blocks, waits, and when it returns, the constant is defined and all continues.

This argument is incorrect, as shown by the tests, but in what sense?