Open wildmaples opened 3 years ago
I have a different scenario that leads to the same stacktrace, involving only Sulong, not TruffleRuby.
I compile Ruby MRI 2.7.2 with bundled clang 10 in latest graal and when I run ruby
or ruby --version
I can see something like:
I have provided a gist showing a Dockerfile to reproduce the issue (files copied into the image were obtained from their official release page)
https://gist.github.com/jeshan/6c70fd0b94b6e521a54b6c71454ebd4c/revisions
The gist v1 is based on the official ruby docker image and v2 shows my changes that reproduce the issue.
It's GR-23843
internally.
@jeshan Please create a separate issue at https://github.com/oracle/graal, I'd like to focus this one on getting grpc to load on TruffleRuby.
I digged into this some time ago with @norswap. I can reproduce on Linux with:
$ gem i grpc
Fetching google-protobuf-3.14.0.gem
Fetching googleapis-common-protos-types-1.0.6.gem
Fetching grpc-1.35.0.gem
Building native extensions. This could take a while...
Successfully installed google-protobuf-3.14.0
Successfully installed googleapis-common-protos-types-1.0.6
Building native extensions. This could take a while...
Successfully installed grpc-1.35.0
3 gems installed
$ ruby -rgrpc -e0
Invalid ElementType of Vector: VariableBitWidthType (java.lang.AssertionError)
from com.oracle.truffle.llvm.runtime.types.VectorType.setElementType(VectorType.java:80)
from com.oracle.truffle.llvm.parser.listeners.Types.setType(Types.java:246)
from com.oracle.truffle.llvm.parser.listeners.Types.record(Types.java:171)
from com.oracle.truffle.llvm.parser.scanner.LLVMScanner.passRecordToParser(LLVMScanner.java:434)
...
The error comes from a vendored version of boringssl
being included by default.
The PR at https://github.com/grpc/grpc/pull/24632/files#diff-fc6f1e850a88ea978d6788c2b825d7feb1dfc2d22e572638ec9ad5061595d245R71 actually changes the extconf.rb
to not include boringssl on TruffleRuby.
It's generally not a good idea to run SSL libraries and constant-time functions on top of a JIT like Sulong.
With that PR:
git clone https://github.com/norswap/grpc.git
cd grpc
git submodule update --init
git checkout truffleruby-build-compat
truffleruby -v # make sure TruffleRuby is in PATH
bundle install
bundle exec rake build
gem uni grpc
gem i -V pkg/grpc-*.dev.gem
$ ruby -rgrpc -e 'p GRPC'
GRPC # works to require it
If we try the examples:
$ cd examples/ruby
$ ruby greeter_server.rb
/home/eregon/.rubies/truffleruby-dev/lib/truffle/truffle/cext.rb:1201:in `__allocate__': TruffleRuby doesn't have a case for the com.oracle.truffle.llvm.runtime.nodes.cast.LLVMToVectorNodeFactory$LLVMSignedCastToI64VectorNodeGen node with values of type com.oracle.truffle.llvm.runtime.vector.LLVMPointerVector (TypeError)
from com.oracle.truffle.llvm.runtime.nodes.cast.LLVMToVectorNodeFactory$LLVMSignedCastToI64VectorNodeGen.executeAndSpecialize(LLVMToVectorNodeFactory.java:575)
from com.oracle.truffle.llvm.runtime.nodes.cast.LLVMToVectorNodeFactory$LLVMSignedCastToI64VectorNodeGen.executeGeneric(LLVMToVectorNodeFactory.java:535)
from com.oracle.truffle.llvm.runtime.nodes.op.LLVMVectorArithmeticNodeGen.executeGeneric(LLVMVectorArithmeticNodeGen.java:37)
from com.oracle.truffle.llvm.runtime.nodes.vars.LLVMWriteNodeFactory$LLVMWriteVectorNodeGen.execute(LLVMWriteNodeFactory.java:835)
from com.oracle.truffle.llvm.runtime.nodes.base.LLVMBasicBlockNode$InitializedBlockNode.execute(LLVMBasicBlockNode.java:161)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNode.doDispatch(LLVMDispatchBasicBlockNode.java:97)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNodeGen.executeGeneric(LLVMDispatchBasicBlockNodeGen.java:20)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMFunctionRootNode.doRun(LLVMFunctionRootNode.java:85)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMFunctionRootNodeGen.executeGeneric(LLVMFunctionRootNodeGen.java:21)
from com.oracle.truffle.llvm.runtime.nodes.func.LLVMFunctionStartNode.execute(LLVMFunctionStartNode.java:88)
from org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:592)
from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/grpc-1.34.0.dev/src/ruby/lib/grpc/generic/rpc_server.rb:234:in `initialize'
from greeter_server.rb:39:in `main'
from greeter_server.rb:48:in `<main>'
I have a fix for that one, I'll try to merge it to Sulong.
With that fix, this happens:
$ ruby greeter_server.rb |& c++filt
/home/eregon/code/truffleruby-ws/truffleruby/mxbuild/truffleruby-jvm/jre/languages/ruby/lib/gems/gems/grpc-1.34.0.dev/src/core/lib/iomgr/exec_ctx.h:223:in `grpc_iomgr_init': \
External LLVMFunction TLS init function for grpc_core::ExecCtx::exec_ctx_ cannot be found. (com.oracle.truffle.llvm.runtime.except.LLVMLinkerException) (RuntimeError)
Translated to internal error
from /home/eregon/code/truffleruby-ws/truffleruby/mxbuild/truffleruby-jvm/jre/languages/ruby/lib/gems/gems/grpc-1.34.0.dev/src/core/lib/surface/init.cc:149:in `grpc_init'
from /home/eregon/code/truffleruby-ws/truffleruby/mxbuild/truffleruby-jvm/jre/languages/ruby/lib/gems/gems/grpc-1.34.0.dev/src/ruby/ext/grpc/rb_grpc.c:285:in `grpc_ruby_init'
from /home/eregon/code/truffleruby-ws/truffleruby/mxbuild/truffleruby-jvm/jre/languages/ruby/lib/gems/gems/grpc-1.34.0.dev/src/ruby/ext/grpc/rb_server.c:131:in `grpc_rb_server_alloc'
from /home/eregon/code/truffleruby-ws/truffleruby/mxbuild/truffleruby-jvm/jre/languages/ruby/lib/truffle/truffle/cext.rb:1202:in `__allocate__'
from /home/eregon/code/truffleruby-ws/truffleruby/mxbuild/truffleruby-jvm/jre/languages/ruby/lib/gems/gems/grpc-1.34.0.dev/src/ruby/lib/grpc/generic/rpc_server.rb:234:in `initialize'
from greeter_server.rb:39:in `main'
from greeter_server.rb:48:in `<main>'
I think this TLS
means Thread Local Storage (not TLS related to SSL).
I'll file a Sulong issue for that (GR-29187).
Most likely related to https://github.com/grpc/grpc/blob/4dc84aea46396cde21d13813efcf8ca3b2fda692/src/core/lib/iomgr/exec_ctx.h#L254 I'd guess this variant is used: https://github.com/grpc/grpc/blob/4dc84aea46396cde21d13813efcf8ca3b2fda692/src/core/lib/gpr/tls_stdcpp.h out of: https://github.com/grpc/grpc/blob/master/src/core/lib/gpr/tls.h
In any case, require 'grpc'
works with https://github.com/grpc/grpc/pull/24632, so I think we need to get the grpc maintainers to merge it.
I've moved my comment to the separate Sulong issue just mentioned.
TruffleRuby doesn't have a case for the LLVMSignedCastToI64VectorNodeGen node with values of type com.oracle.truffle.llvm.runtime.vector.LLVMPointerVector (TypeError) I have a fix for that one, I'll try to merge it to Sulong.
That's now fixed in https://github.com/oracle/graal/commit/9d139ce76a3f15059881e6d3a1386d6ffbe747b8.
Hello, we were able to diagnose the issue in Sulong. Basically the missing symbol is an external weak symbol that's not defined, it was something we didn't support before, but I will fix that now, so it'll be supported. I'll try to get a PR in soon.
Are there any updates on this issue? @eregon @Palez
The PR for this particular issue has been created, and is in the process of being reviewed and merged. However, there is another issue with grpc gem regarding pthread. And I'm also working towards a fix for that as well.
The external weak symbol issue is fixed in https://github.com/oracle/graal/commit/1ddc1c2f178d652185cb5c20de91b9fe984a77e3, and I'm updating the graal import to pick that fix.
There is another issue with pthread_{g,s}etname_np
in Sulong that @Palez is investigating.
Related: recent grpc/google-protobuf need WeakMap to support primitives (#2267) which is now fixed.
Trying it the examples today on Linux, I get:
$ ruby -v greeter_server.rb
truffleruby 21.1.0-dev-fac7597c, like ruby 2.7.2, GraalVM CE Native [x86_64-linux]
java.lang.UnsupportedOperationException: Thread[default-executo,5,main] was not registered
at org.truffleruby.language.SafepointManager.leaveThread(SafepointManager.java:94)
at org.truffleruby.core.thread.ThreadManager.leaveAndEnter(ThreadManager.java:468)
at org.truffleruby.core.fiber.FiberManager.killOtherFibers(FiberManager.java:331)
at org.truffleruby.core.fiber.FiberManager.shutdown(FiberManager.java:359)
at org.truffleruby.core.thread.ThreadManager.cleanup(ThreadManager.java:412)
at org.truffleruby.RubyLanguage.disposeThread(RubyLanguage.java:435)
at org.truffleruby.RubyLanguage.disposeThread(RubyLanguage.java:107)
at com.oracle.truffle.api.LanguageAccessor$LanguageImpl.disposeThread(LanguageAccessor.java:351)
at com.oracle.truffle.polyglot.PolyglotLanguageContext.leaveThread(PolyglotLanguageContext.java:447)
at com.oracle.truffle.polyglot.PolyglotThread$ThreadSpawnRootNode.executeImpl(PolyglotThread.java:146)
at com.oracle.truffle.polyglot.HostToGuestRootNode.execute(HostToGuestRootNode.java:119)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:603)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.profiledPERoot(OptimizedCallTarget.java:574)java.lang.UnsupportedOperationException: Thread[resolver-execut,5,main] was not registered
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:524) at org.truffleruby.language.SafepointManager.leaveThread(SafepointManager.java:94)
at com.oracle.svm.truffle.api.SubstrateOptimizedCallTarget.invokeCallBoundary(SubstrateOptimizedCallTarget.java:121) at org.truffleruby.core.thread.ThreadManager.leaveAndEnter(ThreadManager.java:468)
at org.truffleruby.core.fiber.FiberManager.killOtherFibers(FiberManager.java:331)
at com.oracle.svm.truffle.api.SubstrateOptimizedCallTargetInstalledCode.doInvoke(SubstrateOptimizedCallTargetInstalledCode.java:164)
at com.oracle.svm.truffle.api.SubstrateOptimizedCallTarget.doInvoke(SubstrateOptimizedCallTarget.java:104)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callIndirect(OptimizedCallTarget.java:453)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:434)
at org.truffleruby.core.fiber.FiberManager.shutdown(FiberManager.java:359)
at com.oracle.truffle.polyglot.PolyglotThread.run(PolyglotThread.java:83)
at org.truffleruby.core.thread.ThreadManager.cleanup(ThreadManager.java:412)
at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:526)
at org.truffleruby.RubyLanguage.disposeThread(RubyLanguage.java:435)
at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192)
Caused by: Attached Guest Language Frames (1)
at org.truffleruby.RubyLanguage.disposeThread(RubyLanguage.java:107)
at com.oracle.truffle.api.LanguageAccessor$LanguageImpl.disposeThread(LanguageAccessor.java:351)
at com.oracle.truffle.polyglot.PolyglotLanguageContext.leaveThread(PolyglotLanguageContext.java:447)
at com.oracle.truffle.polyglot.PolyglotThread$ThreadSpawnRootNode.executeImpl(PolyglotThread.java:146)
at com.oracle.truffle.polyglot.HostToGuestRootNode.execute(HostToGuestRootNode.java:119)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:603)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.profiledPERoot(OptimizedCallTarget.java:574)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:524)
at com.oracle.svm.truffle.api.SubstrateOptimizedCallTarget.invokeCallBoundary(SubstrateOptimizedCallTarget.java:121)
at com.oracle.svm.truffle.api.SubstrateOptimizedCallTargetInstalledCode.doInvoke(SubstrateOptimizedCallTargetInstalledCode.java:164)
at com.oracle.svm.truffle.api.SubstrateOptimizedCallTarget.doInvoke(SubstrateOptimizedCallTarget.java:104)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callIndirect(OptimizedCallTarget.java:453)
at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:434)
at com.oracle.truffle.polyglot.PolyglotThread.run(PolyglotThread.java:83)
at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:526)
at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192)
Caused by: Attached Guest Language Frames (1)
#<Thread:0x6e8@/home/eregon/.rubies/truffleruby-dev/lib/truffle/truffle/cext.rb:1605 run> terminated with exception:
Traceback (most recent call last):
from /home/eregon/.rubies/truffleruby-dev/lib/truffle/truffle/cext.rb:1606:in `block in rb_thread_create'
from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/grpc-1.36.0.dev/src/ruby/ext/grpc/rb_event_thread.c:122:in `grpc_rb_event_thread'
from call.c:149:in `rb_thread_call_without_gvl'
from /home/eregon/.rubies/truffleruby-dev/lib/truffle/truffle/cext.rb:1615:in `rb_thread_call_without_gvl'
/home/eregon/.rubies/truffleruby-dev/lib/truffle/truffle/cext.rb:1615:in `block in rb_thread_call_without_gvl': TruffleRuby doesn't have a case for the com.oracle.truffle.llvm.runtime.nodes.asm.syscall.LLVMAMD64SyscallFutexNodeGen node with values of type com.oracle.truffle.llvm.runtime.pointer.LLVMPointerImpl java.lang.Long=128 java.lang.Long=0 com.oracle.truffle.llvm.runtime.pointer.LLVMPointerImpl java.lang.Long=0 java.lang.Long=0 (TypeError)
from com.oracle.truffle.llvm.runtime.nodes.asm.syscall.LLVMAMD64SyscallFutexNodeGen.executeAndSpecialize(LLVMAMD64SyscallFutexNodeGen.java:90)
from com.oracle.truffle.llvm.runtime.nodes.asm.syscall.LLVMAMD64SyscallFutexNodeGen.execute(LLVMAMD64SyscallFutexNodeGen.java:54)
from com.oracle.truffle.llvm.runtime.nodes.asm.syscall.LLVMSyscallNode.cachedSyscall(LLVMSyscallNode.java:66)
from com.oracle.truffle.llvm.runtime.nodes.asm.syscall.LLVMSyscallNodeGen.executeAndSpecialize(LLVMSyscallNodeGen.java:175)
from com.oracle.truffle.llvm.runtime.nodes.asm.syscall.LLVMSyscallNodeGen.executeGeneric(LLVMSyscallNodeGen.java:84)
from com.oracle.truffle.llvm.runtime.nodes.vars.LLVMWriteNodeFactory$LLVMWriteI64NodeGen.execute_generic1(LLVMWriteNodeFactory.java:365)
from com.oracle.truffle.llvm.runtime.nodes.vars.LLVMWriteNodeFactory$LLVMWriteI64NodeGen.execute(LLVMWriteNodeFactory.java:346)
from com.oracle.truffle.llvm.runtime.nodes.base.LLVMBasicBlockNode$InitializedBlockNode.execute(LLVMBasicBlockNode.java:161)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMLoopDispatchNode.executeRepeatingWithValue(LLVMLoopDispatchNode.java:105)
from org.graalvm.compiler.truffle.runtime.OptimizedOSRLoopNode.profilingLoop(OptimizedOSRLoopNode.java:165)
from org.graalvm.compiler.truffle.runtime.OptimizedOSRLoopNode.execute(OptimizedOSRLoopNode.java:123)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMLoopNode$LLVMLoopNodeImpl.loop(LLVMLoopNode.java:80)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMLoopNodeFactory$LLVMLoopNodeImplNodeGen.executeLoop(LLVMLoopNodeFactory.java:22)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNode.doDispatch(LLVMDispatchBasicBlockNode.java:164)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNodeGen.executeGeneric(LLVMDispatchBasicBlockNodeGen.java:20)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMFunctionRootNode.doRun(LLVMFunctionRootNode.java:85)
from com.oracle.truffle.llvm.runtime.nodes.control.LLVMFunctionRootNodeGen.executeGeneric(LLVMFunctionRootNodeGen.java:21)
from com.oracle.truffle.llvm.runtime.nodes.func.LLVMFunctionStartNode.execute(LLVMFunctionStartNode.java:91)
from org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:603)
The first error might be solved by adopting Truffle safepoints, I'm not sure though.
The second error comes from Sulong, that node seems to need some extra specializations.
With branch https://github.com/eregon/grpc/tree/truffleruby-debug which has a couple workarounds,
and latest truffleruby-dev, it works for one message ("Greeting: Hello world"
) and then the client hangs while exiting as it can't interrupt some native call. The server sometimes segfaults in a i64 write (GR-30218).
New PR for grpc, cleaned up and rebased on latest grpc: https://github.com/grpc/grpc/pull/27660
The PR to support building the grpc gem on TruffleRuby has been merged. So now it's about getting the grpc gem to work at runtime, which I'll track in this issue.
https://github.com/cookpad/grpc_kit seems a possible alternative to the grpc gem. It's written in Ruby and uses the google-protobuf gem (which works fine on TruffleRuby). I tried and both the helloworld and routeguide examples work on TruffleRuby!
The test suite also passes, except for 6 failures which are kwargs-related and also happen on CRuby 3.0.3 and one extra failure which is an easy fix:
1) GrpcKit::Session::IO#send_event write data to inner io object
Failure/Error: bytes = @io.write_nonblock(data, exception: false)
ArgumentError:
wrong number of arguments (given 2, expected 1)
# /home/eregon/.rubies/truffleruby-dev/lib/truffle/stringio.rb:103:in `write_nonblock'
# ./lib/grpc_kit/session/io.rb:39:in `send_event'
# ./spec/grpc_kit/session/io_spec.rb:39:in `block (3 levels) in <top (required)>'
FWIW httpx also ships with a grpc plugin which has been successfully testing against truffleruby for quite a while. It's probably the closest to pure ruby (grpc_kit uses dr9 for http2 parsing, which uses C extensions and nghttp2, last time I checked).
Both have very fringe communities and usage, IME. The grpc gem has much bigger community of users, and has codegen capabilities which none of the alternatives can match (GRPC service definitions, all of them can codegen protobufs). Truffleruby should probably ensure compatibility with it, for adoption sake.
Thanks for the context. grpc seems hard to get working because it's a huge amount of rather messy C++/C code (which notably uses reflection with dlsym()
and has multiple implementations of locks to give an idea) and running all that on Sulong is proving challenging.
We might be able to run some part on Sulong and some part natively but that would likely need build system changes in grpc, some help from grpc maintainers and the grpc Ruby maintainers seem overall not so responsive (typically it takes months to merge PRs).
Hence I am exploring lighter-weight other options, and I've heard multiple companies sharing similar concerns for the grpc
gem when using it with CRuby.
It seems an overall feeling that grpc/grpc is heavy and hard to maintain not only for Ruby, for example see what https://buf.build/blog/connect-a-better-grpc says about it ("If you're frustrated by the complexity and instability of today's gRPC libraries").
There might be some way in the future to tell Sulong to execute some parts/functions natively, that might help.
@eregon we're also trying to evaluate truffleruby in production and have hit grpc stumbling block #2697 which we need for development as many of our devs work on M1 macs, but this now seems blocked by https://github.com/oracle/graal/issues/4726 which sounds equally as difficult to workaround. We're forced to use grpc gem because of https://github.com/googleapis/ruby-spanner-activerecord which we experimented with but found Spanner to be dissapointing vs Postgres, so once Google AlloyDB is GA we'll use that and drop Spanner/grpc and be unblocked to try truffleruby again.
All that said, I imagine grpc for all its problems pushes a lot of boundaries for Truffleruby that other gems may also face and making it work now will fix compatibility also for a number of other gems you haven't yet come across. Assuming your goal is still to be MRI compliant :)
Either way, we're trying to make this work. Also delighted @HoneyryderChuck to hear HTTPX supports truffleruby, we're big fans of your work and use it exclusively in our app.
Right, there are quite a few gems depending on grpc
: https://rubygems.org/gems/grpc/reverse_dependencies
We'll look into how feasible it is to run some part natively and some part on Sulong.
I imagine grpc for all its problems pushes a lot of boundaries for Truffleruby that other gems may also face
Actually no, it's literally dozens of complicated fixes which seem to be needed only for grpc
(e.g., supporting direct usages of the futex syscall on Linux). I'm aware of no other popular native extension having similar problems on TruffleRuby. Native extensions don't usually include their own "operating system" like their own SSL implementation, custom locks, network stack/layer, etc, like grpc does.
Thank you for the feedback, I'll try to get https://github.com/oracle/graal/issues/4726 prioritized and we'll keep looking how we can support the grpc
gem.
Summary by @eregon:
The grpc gem should install fine. On macOS you might need truffleruby-dev.
At runtime,
require 'grpc'
works on Linux. Using GRPC functionality does not work yet in general.Requiring the
grpc
gem causes an Invalid ElementType of Vector failure. This is affecting thestorefront-renderer
repository's use ofrequire "semian/grpc"
.How to reproduce
Stacktrace
The below issue is resolved by https://github.com/grpc/grpc/pull/24632, but grpc did not merge that PR yet.
As chatted about in our call, it seems like a C language feature that Sulong doesn't support.
Issue about compiling/installing grpc: #1982
General internal issue about grpc: GR-23874