Open fxn opened 3 years ago
This is a shell script that essentially does the same as the flaky test:
#!/bin/sh
cat <<EOS > m.rb
module M
sleep 0.5
def self.works?
true
end
end
EOS
ruby -I. <<EOS
autoload :M, "m"
t = Thread.new { M }
p M.works?
EOS
rm m.rb
If it passes, you see "true" printed. Otherwise, M
does not respond to works?
(NoMethodError
) because there was a context switch before the require
triggered by autoload
returned.
With CRuby, that passes even with sleep 60
.
Thanks a lot for the report.
This is an area where the semantics of CRuby seems unclear, e.g. does it make other threads wait until the end of module M
or the end of the file until the constant M is "published"? What happens if the autoloaded file creates threads, do they see M?
In short this is not yet implemented in TruffleRuby (IIRC), because of unclear semantics. Currently there is a lock on who wins to autoload a constant, but currently no lock for "wait some other thread has finished autoloading", since the constant is published as soon as assigned. Preventing thread switching itself seems difficult on JVM, and would feel like a hack at best + potential for deadlocks. I guess we should look at what CRuby does here but the autoload logic in CRuby seems particularly unreadable.
Re Zeitwerk CI, could you exclude this test for TruffleRuby and keep running other tests? If it passed before I would think it was just lucky timings.
BTW to improve the reliability of that test I believe you would need a sleep (smaller, e.g. of 0.1) between the Thread.new and assert M.works?
, otherwise it's likely the M.works? runs first and then the other thread will see the constant is autoloading and wait, and anyway that thread does not check what is defined on M. For the test to fail it needs to be the Thread.new running first and defining the constant but not yet the method, before the main thread keeps running and call the method.
Problem is, this is a fundamental property of Zeitwerk, which is a fundamental property of CRuby: You won't see a class half-defined due to context switching while autoloading. Meaning, what the test does (of course, a class may be reopened elsewhere later, not in a generic sense).
BTW to improve the reliability of that test I believe you would need a sleep (smaller, e.g. of 0.1) between the Thread.new and assert M.works?
You're right.
Test revised here: https://github.com/fxn/zeitwerk/commit/016404cc94c9940e31cf057955d7ecdd43f70f54. Thanks for the comment about that :).
Hey! I have documented this in the README. I believe it is accurate, but let me know if it should word it in a different way :).
That looks accurate, I'll try to fix this issue soon.
Some more details about how CRuby handles this: https://github.com/ruby/ruby/pull/5788 And: (from @ioquatix)
No, it's literally cached, there is an autoload cache Instead of setting the constant on the module, it's stashed into an autoload save area once the autoload is done, all the constants are added to the module and all waiting threads are unblocked
I'm slowly figuring out how it works, I guess it's not a bad implementation, but I think variable.c
could do with a rewrite. lol. https://github.com/ruby/ruby/pull/5898 fyi
FWIW, I found https://bugs.ruby-lang.org/issues/15663#change-77214 where I asked some autoload questions and which makes it somewhat clear it's so complex nobody understands it fully.
I think it shouldn't be too bad to implement in TruffleRuby because we already have a lock per autoload constant.
But potentially deadlocks is an issue to consider, i.e., what if one thread is loading A and that needs B, and another thread is loading B and that needs A? Deadlock or the "isolation" of autoload
is broken in that case?
I am not familiar with the implementation, but isn't require
already synchronized to avoid seeing partially evaluated files elsewhere? Why aren't these autoloads inheriting or relying on this property?
@fxn require
only synchronizes per file, i.e., to guarantee a file is only loaded once (that's easy enough).
That doesn't prevent that e.g. class C
defined in c.rb
is seen partially defined (e.g. some methods defined, some not, only part of c.rb
was evaluated) by file use.rb
being loaded in another thread and e.g. doing C.new.foo
-> potential NoMethodError.
autoload
makes it possible to know some of those dependencies, but nested autoloads cannot be known in advance.
I think I'll need to write a test script to figure what CRuby does there and if/how it deals with those potential deadlocks.
The general strategy for autoload seems to not publish a constant's value while the file is being loaded (so other threads cannot see it's set and continue executing, they have to block and wait), and instead the autoload logic stashes the value somewhere so only that thread can see it, and then publish the constant at once when the file has been loaded. Not sure about other constants defined in the file, I'd think there is no special behavior there.
@eregon But if user.rb
uses C
, it should require 'c'
before at some point in the code path, no?
Something that could be relevant in @ioquatix foobar.rb
example (just a remark, I don't know from the top of my head) is the constant lookup that happens in module Foo; end
may trigger the autoload for Foo
, depending on load order. Same for the constant lookup that happens in module Bar; end
. So, one of them is trying to require
a file that is being require
d by the other one simultaneously.
So, let me repeat the question.
The example sets autoloads for two constants, pointing to the same file. Since that is synchronized, why aren't the autoloads?
I'd expect the second one to trigger a require
that blocks, waits, and when it returns, the constant is defined and all continues.
This argument is incorrect, as shown by the tests, but in what sense?
Hi!
The suite of Zeitwerk has this test to check autoloading blocks context switching (autoloading is thread-safe):
It has passed regularly for a long time, but it started to fail in a flaky way both in
truffleruby
and intruffleruby-head
. See for example this build, or this build.Does it ring a bell?