oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
3.01k stars 185 forks source link

"An exception escaped out of the interpreter" when not connected to a tty #2422

Open noahgibbs opened 3 years ago

noahgibbs commented 3 years ago

When I attempt to run large batch jobs with "nohup" with several different Rubies, TruffleRuby crashes because it can't correctly hook up STDERR. Specifically, what I see on all the Truffle benchmarks is:

Exception while loading core library:
Error while formatting Ruby exception:
<internal:core> core/exception.rb:159:in `const_missing': uninitialized constant Exception::STDERR (NameError)
        from <internal:core> core/exception.rb:159:in `to_tty?'
        from <internal:core> core/exception.rb:96:in `full_message'
        from <internal:core> core/truffle/exception_operations.rb:180:in `get_formatted_backtrace'
Original Ruby exception:
<internal:core> core/io.rb:938:in `setup': Invalid argument - Invalid new mode for existing descriptor 0 (Errno::EINVAL)
        from <internal:core> core/io.rb:972:in `initialize'
        from <internal:core> core/post.rb:38:in `<top (required)>'

truffleruby: an exception escaped out of the interpreter - this is an implementation bug
org.graalvm.polyglot.PolyglotException: com.oracle.truffle.api.CompilerDirectives$ShouldNotReachHere: couldn't load the core library
        at org.graalvm.truffle/com.oracle.truffle.api.CompilerDirectives.shouldNotReachHere(CompilerDirectives.java:557)
        at org.truffleruby.core.CoreLibrary.loadRubyCoreLibraryAndPostBoot(CoreLibrary.java:784)
        at org.truffleruby.RubyContext.initialize(RubyContext.java:254)
        at org.truffleruby.RubyLanguage.initializeContext(RubyLanguage.java:385)
        at org.truffleruby.RubyLanguage.initializeContext(RubyLanguage.java:120)
        at org.graalvm.truffle/com.oracle.truffle.api.TruffleLanguage$Env.postInit(TruffleLanguage.java:3640)
        at org.graalvm.truffle/com.oracle.truffle.api.LanguageAccessor$LanguageImpl.postInitEnv(LanguageAccessor.java:300)
        at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotLanguageContext.ensureInitialized(PolyglotLanguageContext.java:638)
        at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.eval(PolyglotContextImpl.java:1346)
        at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextDispatch.eval(PolyglotContextDispatch.java:62)
        at org.graalvm.sdk/org.graalvm.polyglot.Context.eval(Context.java:375)
        at org.truffleruby.launcher.RubyLauncher.runRubyMain(RubyLauncher.java:232)
        at org.truffleruby.launcher.RubyLauncher.launch(RubyLauncher.java:128)
        at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:124)
        at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:71)
        at org.truffleruby.launcher.RubyLauncher.main(RubyLauncher.java:38)
Caused by: org.truffleruby.language.control.RaiseException: Invalid argument - Invalid new mode for existing descriptor 0 (Errno::EINVAL)
Internal GraalVM error, please report at https://github.com/oracle/graal/issues/.
eregon commented 3 years ago

Thanks for the report and sorry for the late reply. Standard streams not being a TTY works fine, the issue is that fcntl(0=stdin, F_GETFL) returns 1=WRONLY, but we expect 0=RDONLY. So I guess nohup sets for fd=0=stdin a WRONLY file as a way to achieve If standard input is a terminal, redirect it from an unreadable file.

Actually on Linux, fcntl(0 or 1 or 2, F_GETFL) returns O_RDWR by default, and CRuby just overrides that at the Ruby level to be the expected RDONLY, WRONLY, WRONLY (IIRC).

So somehow we need to have special initialization for those streams so they either use the underlying fd mode (unless it's O_RDWR), or ignore the mismatch.

FWIW, this is what happens on CRuby, it seems to ignore the mismatch:

$ nohup ruby -e 'p STDIN.read'
nohup: ignoring input and appending output to 'nohup.out'
$ cat nohup.out
-e:1:in `read': Bad file descriptor @ io_fread - <STDIN> (Errno::EBADF)
    from -e:1:in `<main>'

Not exactly a nice error. And writing is not allowed at Ruby level:

$ nohup ruby -e 'STDIN.puts "abc"' 
$ cat nohup.out 
-e:1:in `write': not opened for writing (IOError)
    from -e:1:in `puts'
    from -e:1:in `<main>'

Using screen/tmux or ruby ... < /dev/null are workarounds which are less dangerous than having a non-readable writable STDIN.

noahgibbs commented 3 years ago

Huh. I wonder if there's something odd about -e with CRuby. I actually use nohup quite routinely with large CRuby batch jobs, and it writes to file as you'd expect. But maybe I'm doing something odd that it handles differently than TruffleRuby.

eregon commented 3 years ago

Writing is OK, it just redirects STDOUT & STDERR to some file, same as >file 2>&1 in the shell.

What's hacky is it sets a non-readable (but writable) file for STDIN, which in this case breaks TruffleRuby's initialization of the standard streams (I have a fix), but that could also easily break any Ruby program if they ever try to read from STDIN (it's a low-level EBADF vs something like an IOError which could be expected). IMHO /dev/null as STDIN would have been a much saner way for nohup, but we can't change that. For the anecdote I know of another hack done on the JVM for signals just to accommodate nohup, needless to say I feel nohup is super hacky, but anyway I'll fix this issue.

eregon commented 2 years ago

Fixing this turns out to be rather complicated to not break tests, and writing a portable test for this seems pretty hard since nohup behave differently on different OS (some of them do not set STDIN as WRONLY). I might need to shim nohup as a Ruby script or so to replicate the problematic conditions.