Closed uiur closed 5 years ago
@uiureo Thank you for the detailed information. We will look into this.
I have the same Segmentation fault error. It only occurs when I add the stackdriver
. Our app is running on Puma, with Ruby 2.5.1 on GKE.
Full log here: https://gist.github.com/lucaong/63e815c05b0db59618baebed56b4616b
Interestingly, we run two different kubernetes deployments of the same app, one is the web application and the other is the background job workers. Only the latter seem to be affected by this.
Also, but possibly unrelated, close to the segfault we observe often multiple times these errors too:
E ruby: ../nptl/pthread_mutex_lock.c:352: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.
Possibly related to https://github.com/protocolbuffers/protobuf/issues/4460
There is no mention of protobuf
in the stacktrace, but the chain of methods that get called when /google-cloud-logging-1.5.4/lib/google/cloud/logging/async_writer.rb:322
calls MonitorMixin::ConditionVariable#broadcast
and wakes up awaiting threads ends up calling this: https://github.com/googleapis/google-cloud-ruby/blob/master/google-cloud-logging/lib/google/cloud/logging/v2/logging_service_v2_client.rb#L395
, which uses protobuf from Google::Gax
same here, on puma with the same ruby and rails versions
@uiureo @lucaong @fruwe I've been asked to look into this more and am wondering if we can get more information about when you see this. Is this segfault happening immediately on app start? Or does it happen after the app has been running? How often do you see this segfault? How heavy is your app being run when you see this? How many threads/workers are you running when you see this?
Any information you can provide is very much appreciated. I haven't reproduced this myself, so providing a small app/docker image that does repro would also help immensely.
@blowmage Thanks a lot. I already removed the gem until it is fixed, but as far as I can remember, the segfault happened on startup and without any load (staging environment). We were using 2 threads and 2 workers
Same here, the segfault was happening both on staging and in production (staging with no load). In our case though, only the background job workers were affected.
The workers are Rails ActiveJob running with Sneakers (RabbitMQ job queues for Ruby) on 4 workers per machine, 10 maximum threads per worker.
We removed the gem and instead wrote a log formatter and exception logger for the Stackdriver JSON format. Removing the gem, the issue is gone, and we simply log to stdout, with logs being collected correctly into Stackdriver by the GKE infrastructure.
Was this using docker? If so, was it using a common or custom image?
In our case, it runs on Docker, on an image derived from the official ruby:2.5.1
image. On top of it, we basically just install NodeJS and build our Rails 5.2 app.
I am also getting random segfaults, cannot always reproduce
Ubuntu 18.04, ruby 2.5.1 using rbenv, Rails 5.2.1, stackdriver gem 0.15.3
require 'stackdriver'
in config/application.rb
opening the Rails console rails c
running an arbitrary command such as Rails.logger
to see that it uses the Google::Cloud::Logging::Logger
exiting with CTRL+D
-> causes segfault
example:
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:171: [BUG] Segmentation fault at 0x0000000000000000
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0010 p:---- s:0056 e:000055 CFUNC :fork
c:0009 p:0240 s:0052 e:000051 METHOD /home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:171
c:0008 p:0065 s:0038 e:000037 BLOCK /home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:141 [FINISH]
c:0007 p:---- s:0035 e:000034 CFUNC :loop
c:0006 p:0020 s:0031 e:000030 METHOD /home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:135
c:0005 p:0153 s:0027 E:000858 TOP /home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application/boot.rb:19 [FINISH]
c:0004 p:---- s:0023 e:000022 CFUNC :require
c:0003 p:0110 s:0018 e:000017 METHOD /home/dirk/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59
c:0002 p:0006 s:0006 e:000005 EVAL -e:1 [FINISH]
c:0001 p:0000 s:0003 E:0023b0 (none) [FINISH]
-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in `require'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in `require'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application/boot.rb:19:in `<top (required)>'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:135:in `run'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:135:in `loop'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:141:in `block in run'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:171:in `serve'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:171:in `fork'
-- Machine register context ------------------------------------------------
RIP: 0x00007ff47bd3ffa0 RBP: 0xc0b58948ffffff38 RSP: 0x00007fff2256bf88
RAX: 0x00007ff451d32a00 RBX: 0x00005629e8c959b4 RCX: 0x00005629ed2f6b50
RDX: 0x00007ff47c1550a0 RDI: 0xc0b58948ffffff38 RSI: 0x00007ff47c1550a0
R8: 0x00007ff44413a788 R9: 0x00007ff47c1550a0 R10: 0x0000000000000000
R11: 0x0000000000000000 R12: 0xc0b58948fffffec8 R13: 0x00007ff45fbf98d0
R14: 0x00007ff469c450a0 R15: 0x00007ff469c45188 EFL: 0x0000000000010206
-- C level backtrace information -------------------------------------------
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_vm_bugreport+0x7d3) [0x5629e8d37f73] vm_dump.c:703
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_bug_context+0xd1) [0x5629e8d2b681] error.c:580
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(sigsegv+0x42) [0x5629e8c14482] signal.c:928
/lib/x86_64-linux-gnu/libpthread.so.0(0x7ff47bd48890) [0x7ff47bd48890]
/lib/x86_64-linux-gnu/libpthread.so.0(0x7ff47bd3ffa0) [0x7ff47bd3ffa0]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(gpr_mu_lock+0x9) [0x7ff45fc102b9]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fd14b8e) [0x7ff45fd14b8e]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fbe84af) [0x7ff45fbe84af]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fbf9728) [0x7ff45fbf9728]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(_ZN9grpc_core7ExecCtx5FlushEv+0x51) [0x7ff45fbd2ca1]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(grpc_channel_destroy+0x146) [0x7ff45fbe6e96]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fbc9831) [0x7ff45fbc9831]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fbc98a1) [0x7ff45fbc98a1]
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_gc_call_finalizer_at_exit+0x448) [0x5629e8b30c88] gc.c:2831
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(ruby_cleanup+0x254) [0x5629e8b16044] eval.c:133
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(ruby_stop+0x9) [0x5629e8b161d9] eval.c:258
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_f_fork+0x57) [0x5629e8bc3517] process.c:3708
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_cfunc+0xe1) [0x5629e8c86941] vm_insnhelper.c:1918
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_method+0xd3) [0x5629e8c96433] vm_insnhelper.c:2381
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec_core+0x255b) [0x5629e8c8f48b] insns.def:850
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec+0x7d) [0x5629e8c9483d] vm.c:1778
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(loop_i+0x539) [0x5629e8c9d009] vm.c:979
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_rescue2+0xcf) [0x5629e8b16c2f] eval.c:896
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_cfunc+0xe1) [0x5629e8c86941] vm_insnhelper.c:1918
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_method+0xd3) [0x5629e8c96433] vm_insnhelper.c:2381
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec_core+0x255b) [0x5629e8c8f48b] insns.def:850
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec+0x7d) [0x5629e8c9483d] vm.c:1778
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_load_internal0+0xc1) [0x5629e8b60a41] load.c:611
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_require_internal+0x871) [0x5629e8b63941] load.c:994
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_f_require+0x13) [0x5629e8b639c3] load.c:1040
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_cfunc+0xe1) [0x5629e8c86941] vm_insnhelper.c:1918
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec_core+0x3136) [0x5629e8c90066] insns.def:915
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec+0x7d) [0x5629e8c9483d] vm.c:1778
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(ruby_exec_internal+0xc1) [0x5629e8b11aa1] eval.c:246
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(ruby_run_node+0x2f) [0x5629e8b1627f] eval.c:310
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(main+0x4b) [0x5629e8b117cb] ./include/ruby/intern.h:295
Thanks @dirkjonker, does the following workaround fix this for you?
https://github.com/protocolbuffers/protobuf/issues/4460#issuecomment-434631296
Hi, I'm facing the same segmentation issue on GAE, I'm able to deploy and run my migrations (via the appengine gem) without any issues, but it responds with a 503 when trying to access it.
The app is running on Ruby 2.5.3 and Rails 5.2.0, appengine 0.4.6
Logs and Gemfile.lock are available in this gist: https://gist.github.com/paulpyrb/1161a24ba970391a2f8a94f85207a494
@blowmage I tried the workaround you've linked above by adding BUNDLE_FORCE_RUBY_PLATFORM
as a variable in my app.yaml file, but it didn't fix the issue.
entrypoint: bundle exec rackup --port $PORT
env: flex
runtime: ruby
env_variables:
BUNDLE_FORCE_RUBY_PLATFORM: 1 # segmentation fix
beta_settings:
cloud_sql_instances:
Later edit: Adding require: false
for the appengine gem fixes the segmentation issue for me.
I am using Docker (alpine) and experiencing exactly the same issue; the workaround didn't work for me either.
The log is long, but it starts with this:
/bundle/gems/google-protobuf-3.6.1-x86_64-linux/lib/google/protobuf/any_pb.rb:15: [BUG] Segmentation fault at 0x0000000000016d96
ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-linux-musl]
-- Control frame information -----------------------------------------------
c:0053 p:---- s:0259 e:000258 CFUNC :msgclass
c:0052 p:0026 s:0255 e:000254 CLASS /bundle/gems/google-protobuf-3.6.1-x86_64-linux/lib/google/protobuf/any_pb.rb:15
c:0051 p:0011 s:0252 e:000251 CLASS /bundle/gems/google-protobuf-3.6.1-x86_64-linux/lib/google/protobuf/any_pb.rb:14
c:0050 p:0039 s:0249 E:0021e0 TOP /bundle/gems/google-protobuf-3.6.1-x86_64-linux/lib/google/protobuf/any_pb.rb:13 [FINISH]
c:0049 p:---- s:0246 e:000245 CFUNC :require
@axalix, unfortunately yours looks like a separate issue. The segfault is from a different method, and the backtrace shows you are using a pre-compiled gem. You need to install the pure-ruby gem which will compile the C-code.
@blowmage I'm having some issues trying the "fix" as I cannot seem to build V8 from scratch without some effort (unfortunately one of the dependencies is libv8 which will build V8 from source as well)
@axalix this works for me:
apk add build-base
gem install google-cloud-logging --platform ruby
@blowmage I have some issues related to the grpc forking issues. I am loading the stackdriver gem conditionally and it seems to load some grpc component before forking. I'll have to investigate a bit further, I might open a new ticket for it.
@dirkjonker As we’ve been investigating this and other issues, we noticed that the logger is set before the fork. This is not a problem unless something logs before the fork, as this causes grpc to be lazily loaded. We have a fix for this in #2733. We are in a release freeze, but you can try this code by changing your Gemfile to the following:
gem "stackdriver", github: "googleapis/google-cloud-ruby"
Hello, I encountered the following problem for my tests and emulate it in rails (5.1.6) console : ruby 2.5.1
It 'failed' do
(Time.now + 3.033.minutes).to_s
end
ruby-2.5.1/lib/ruby/2.5.0/pp.rb:288: [BUG] Segmentation fault at 0x0000000100000001
or
gems/ruby-2.5.1/gems/activesupport-5.1.6/lib/active_support/core_ext/time/conversions.rb:51:
#[BUG] Segmentation fault at 0x0000000002000001
or
gems/ruby-2.5.1/gems/activesupport-5.1.6/lib/active_support/time_with_zone.rb:219:
#[BUG] Segmentation fault at 0x000000006647491d
It 'seccess' do
(Time.now + 3.minutes).to_s
end
Can someone faced a similar problem?
@Viktor-Ivliev Can you open a new issue with all the relevant details?
@blowmage thanks, the fix seems to work, just submitted #2742 to fix one issue I have with it
@dirkjonker Glad to hear it.
@uiureo I have been unable to reproduce the exact segfault you have reported. I see that you are using Unicorn. Are you seeing the segfault pre or post-fork? Can you try updating to use the code in the repo and see if that helps?
I've tried the latest version of the gem, and it works perfectly in our environment with ruby 2.5.3 + rails 5.2.2. I don't run into this problem anymore.
Thanks!
I got segmentation fault when using stackdriver gem with rails application
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux] rails 5.2.1
The app is deployed on unicorn
full log: https://gist.github.com/uiureo/ba3cd795134dfec41410d92f801323d4