googleapis / google-cloud-ruby

Google Cloud Client Library for Ruby
https://googleapis.github.io/google-cloud-ruby/
Apache License 2.0
1.36k stars 550 forks source link

Segmentation fault on stackdriver + ruby 2.5 + rails #2453

Closed uiur closed 5 years ago

uiur commented 6 years ago

I got segmentation fault when using stackdriver gem with rails application

ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux] rails 5.2.1

The app is deployed on unicorn

/home/admin/.rbenv/versions/2.5.1/lib/ruby/2.5.0/monitor.rb:147: [BUG] Segmentation fault at 0x000000000000003c
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0077 p:---- s:0537 e:000536 CFUNC  :broadcast
c:0076 p:0015 s:0533 e:000532 METHOD /home/admin/.rbenv/versions/2.5.1/lib/ruby/2.5.0/monitor.rb:147
c:0075 p:0007 s:0529 e:000528 BLOCK  /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/google-cloud-logging-1.5.4/lib/google/cloud/logging/async_writer.rb
c:0074 p:0007 s:0526 e:000525 METHOD /home/admin/.rbenv/versions/2.5.1/lib/ruby/2.5.0/monitor.rb:226
c:0073 p:0005 s:0522 e:000521 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/google-cloud-logging-1.5.4/lib/google/cloud/logging/async_writer.rb
c:0072 p:0004 s:0518 e:000517 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/stackdriver-core-1.3.2/lib/stackdriver/core/async_actor.rb:344
c:0071 p:0060 s:0514 e:000513 BLOCK  /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/stackdriver-core-1.3.2/lib/stackdriver/core/async_actor.rb:312 [FINISH]
c:0070 p:---- s:0511 e:000510 CFUNC  :synchronize
c:0069 p:0022 s:0507 e:000506 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/stackdriver-core-1.3.2/lib/stackdriver/core/async_actor.rb:306
c:0068 p:0004 s:0503 e:000502 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/google-cloud-logging-1.5.4/lib/google/cloud/logging/async_writer.rb
c:0067 p:0120 s:0494 e:000493 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/google-cloud-logging-1.5.4/lib/google/cloud/logging/logger.rb:553
c:0066 p:0079 s:0485 e:000484 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/google-cloud-logging-1.5.4/lib/google/cloud/logging/logger.rb:317
c:0065 p:0031 s:0478 e:000477 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/google-cloud-logging-1.5.4/lib/google/cloud/logging/logger.rb:211
c:0064 p:0032 s:0472 e:000471 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/railties-5.2.1/lib/rails/rack/logger.rb:37
c:0063 p:0009 s:0462 e:000461 BLOCK  /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/railties-5.2.1/lib/rails/rack/logger.rb:26
c:0062 p:0004 s:0459 e:000458 BLOCK  /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/activesupport-5.2.1/lib/active_support/tagged_logging.rb:71
c:0061 p:0013 s:0456 e:000455 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/activesupport-5.2.1/lib/active_support/tagged_logging.rb:28
c:0060 p:0012 s:0450 e:000449 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/activesupport-5.2.1/lib/active_support/tagged_logging.rb:71
c:0059 p:0041 s:0445 e:000444 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/railties-5.2.1/lib/rails/rack/logger.rb:26
c:0058 p:0092 s:0439 e:000438 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/newrelic_rpm-5.4.0.347/lib/new_relic/agent/instrumentation/middlewa
c:0057 p:0049 s:0430 e:000429 METHOD /home/admin/rails_app/shared/bundle/ruby/2.5.0/gems/google-cloud-logging-1.5.4/lib/google/cloud/logging/middleware.rb:9

full log: https://gist.github.com/uiureo/ba3cd795134dfec41410d92f801323d4

blowmage commented 6 years ago

@uiureo Thank you for the detailed information. We will look into this.

lucaong commented 6 years ago

I have the same Segmentation fault error. It only occurs when I add the stackdriver. Our app is running on Puma, with Ruby 2.5.1 on GKE.

Full log here: https://gist.github.com/lucaong/63e815c05b0db59618baebed56b4616b

Interestingly, we run two different kubernetes deployments of the same app, one is the web application and the other is the background job workers. Only the latter seem to be affected by this.

Also, but possibly unrelated, close to the segfault we observe often multiple times these errors too:

E  ruby: ../nptl/pthread_mutex_lock.c:352: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.
lucaong commented 6 years ago

Possibly related to https://github.com/protocolbuffers/protobuf/issues/4460

There is no mention of protobuf in the stacktrace, but the chain of methods that get called when /google-cloud-logging-1.5.4/lib/google/cloud/logging/async_writer.rb:322 calls MonitorMixin::ConditionVariable#broadcast and wakes up awaiting threads ends up calling this: https://github.com/googleapis/google-cloud-ruby/blob/master/google-cloud-logging/lib/google/cloud/logging/v2/logging_service_v2_client.rb#L395, which uses protobuf from Google::Gax

fruwe commented 6 years ago

same here, on puma with the same ruby and rails versions

blowmage commented 6 years ago

@uiureo @lucaong @fruwe I've been asked to look into this more and am wondering if we can get more information about when you see this. Is this segfault happening immediately on app start? Or does it happen after the app has been running? How often do you see this segfault? How heavy is your app being run when you see this? How many threads/workers are you running when you see this?

Any information you can provide is very much appreciated. I haven't reproduced this myself, so providing a small app/docker image that does repro would also help immensely.

fruwe commented 6 years ago

@blowmage Thanks a lot. I already removed the gem until it is fixed, but as far as I can remember, the segfault happened on startup and without any load (staging environment). We were using 2 threads and 2 workers

lucaong commented 6 years ago

Same here, the segfault was happening both on staging and in production (staging with no load). In our case though, only the background job workers were affected.

The workers are Rails ActiveJob running with Sneakers (RabbitMQ job queues for Ruby) on 4 workers per machine, 10 maximum threads per worker.

We removed the gem and instead wrote a log formatter and exception logger for the Stackdriver JSON format. Removing the gem, the issue is gone, and we simply log to stdout, with logs being collected correctly into Stackdriver by the GKE infrastructure.

blowmage commented 6 years ago

Was this using docker? If so, was it using a common or custom image?

lucaong commented 6 years ago

In our case, it runs on Docker, on an image derived from the official ruby:2.5.1 image. On top of it, we basically just install NodeJS and build our Rails 5.2 app.

dirkjonker commented 5 years ago

I am also getting random segfaults, cannot always reproduce Ubuntu 18.04, ruby 2.5.1 using rbenv, Rails 5.2.1, stackdriver gem 0.15.3 require 'stackdriver' in config/application.rb opening the Rails console rails c running an arbitrary command such as Rails.logger to see that it uses the Google::Cloud::Logging::Logger exiting with CTRL+D -> causes segfault

example:

/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:171: [BUG] Segmentation fault at 0x0000000000000000
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0010 p:---- s:0056 e:000055 CFUNC  :fork
c:0009 p:0240 s:0052 e:000051 METHOD /home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:171
c:0008 p:0065 s:0038 e:000037 BLOCK  /home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:141 [FINISH]
c:0007 p:---- s:0035 e:000034 CFUNC  :loop
c:0006 p:0020 s:0031 e:000030 METHOD /home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:135
c:0005 p:0153 s:0027 E:000858 TOP    /home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application/boot.rb:19 [FINISH]
c:0004 p:---- s:0023 e:000022 CFUNC  :require
c:0003 p:0110 s:0018 e:000017 METHOD /home/dirk/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59
c:0002 p:0006 s:0006 e:000005 EVAL   -e:1 [FINISH]
c:0001 p:0000 s:0003 E:0023b0 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in `require'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in `require'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application/boot.rb:19:in `<top (required)>'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:135:in `run'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:135:in `loop'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:141:in `block in run'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:171:in `serve'
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/spring-2.0.2/lib/spring/application.rb:171:in `fork'

-- Machine register context ------------------------------------------------
 RIP: 0x00007ff47bd3ffa0 RBP: 0xc0b58948ffffff38 RSP: 0x00007fff2256bf88
 RAX: 0x00007ff451d32a00 RBX: 0x00005629e8c959b4 RCX: 0x00005629ed2f6b50
 RDX: 0x00007ff47c1550a0 RDI: 0xc0b58948ffffff38 RSI: 0x00007ff47c1550a0
  R8: 0x00007ff44413a788  R9: 0x00007ff47c1550a0 R10: 0x0000000000000000
 R11: 0x0000000000000000 R12: 0xc0b58948fffffec8 R13: 0x00007ff45fbf98d0
 R14: 0x00007ff469c450a0 R15: 0x00007ff469c45188 EFL: 0x0000000000010206

-- C level backtrace information -------------------------------------------
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_vm_bugreport+0x7d3) [0x5629e8d37f73] vm_dump.c:703
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_bug_context+0xd1) [0x5629e8d2b681] error.c:580
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(sigsegv+0x42) [0x5629e8c14482] signal.c:928
/lib/x86_64-linux-gnu/libpthread.so.0(0x7ff47bd48890) [0x7ff47bd48890]
/lib/x86_64-linux-gnu/libpthread.so.0(0x7ff47bd3ffa0) [0x7ff47bd3ffa0]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(gpr_mu_lock+0x9) [0x7ff45fc102b9]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fd14b8e) [0x7ff45fd14b8e]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fbe84af) [0x7ff45fbe84af]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fbf9728) [0x7ff45fbf9728]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(_ZN9grpc_core7ExecCtx5FlushEv+0x51) [0x7ff45fbd2ca1]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(grpc_channel_destroy+0x146) [0x7ff45fbe6e96]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fbc9831) [0x7ff45fbc9831]
/home/dirk/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/grpc-1.17.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so(0x7ff45fbc98a1) [0x7ff45fbc98a1]
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_gc_call_finalizer_at_exit+0x448) [0x5629e8b30c88] gc.c:2831
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(ruby_cleanup+0x254) [0x5629e8b16044] eval.c:133
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(ruby_stop+0x9) [0x5629e8b161d9] eval.c:258
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_f_fork+0x57) [0x5629e8bc3517] process.c:3708
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_cfunc+0xe1) [0x5629e8c86941] vm_insnhelper.c:1918
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_method+0xd3) [0x5629e8c96433] vm_insnhelper.c:2381
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec_core+0x255b) [0x5629e8c8f48b] insns.def:850
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec+0x7d) [0x5629e8c9483d] vm.c:1778
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(loop_i+0x539) [0x5629e8c9d009] vm.c:979
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_rescue2+0xcf) [0x5629e8b16c2f] eval.c:896
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_cfunc+0xe1) [0x5629e8c86941] vm_insnhelper.c:1918
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_method+0xd3) [0x5629e8c96433] vm_insnhelper.c:2381
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec_core+0x255b) [0x5629e8c8f48b] insns.def:850
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec+0x7d) [0x5629e8c9483d] vm.c:1778
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_load_internal0+0xc1) [0x5629e8b60a41] load.c:611
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_require_internal+0x871) [0x5629e8b63941] load.c:994
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(rb_f_require+0x13) [0x5629e8b639c3] load.c:1040
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_call_cfunc+0xe1) [0x5629e8c86941] vm_insnhelper.c:1918
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec_core+0x3136) [0x5629e8c90066] insns.def:915
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(vm_exec+0x7d) [0x5629e8c9483d] vm.c:1778
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(ruby_exec_internal+0xc1) [0x5629e8b11aa1] eval.c:246
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(ruby_run_node+0x2f) [0x5629e8b1627f] eval.c:310
/home/dirk/.rbenv/versions/2.5.1/bin/ruby(main+0x4b) [0x5629e8b117cb] ./include/ruby/intern.h:295
blowmage commented 5 years ago

Thanks @dirkjonker, does the following workaround fix this for you?

https://github.com/protocolbuffers/protobuf/issues/4460#issuecomment-434631296

abpaul commented 5 years ago

Hi, I'm facing the same segmentation issue on GAE, I'm able to deploy and run my migrations (via the appengine gem) without any issues, but it responds with a 503 when trying to access it.

The app is running on Ruby 2.5.3 and Rails 5.2.0, appengine 0.4.6

Logs and Gemfile.lock are available in this gist: https://gist.github.com/paulpyrb/1161a24ba970391a2f8a94f85207a494

@blowmage I tried the workaround you've linked above by adding BUNDLE_FORCE_RUBY_PLATFORM as a variable in my app.yaml file, but it didn't fix the issue.

entrypoint: bundle exec rackup --port $PORT
env: flex
runtime: ruby

env_variables:
  BUNDLE_FORCE_RUBY_PLATFORM: 1 # segmentation fix

beta_settings:
  cloud_sql_instances:

Later edit: Adding require: false for the appengine gem fixes the segmentation issue for me.

axalix commented 5 years ago

I am using Docker (alpine) and experiencing exactly the same issue; the workaround didn't work for me either.

The log is long, but it starts with this:

/bundle/gems/google-protobuf-3.6.1-x86_64-linux/lib/google/protobuf/any_pb.rb:15: [BUG] Segmentation fault at 0x0000000000016d96
ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-linux-musl]

-- Control frame information -----------------------------------------------
c:0053 p:---- s:0259 e:000258 CFUNC  :msgclass
c:0052 p:0026 s:0255 e:000254 CLASS  /bundle/gems/google-protobuf-3.6.1-x86_64-linux/lib/google/protobuf/any_pb.rb:15
c:0051 p:0011 s:0252 e:000251 CLASS  /bundle/gems/google-protobuf-3.6.1-x86_64-linux/lib/google/protobuf/any_pb.rb:14
c:0050 p:0039 s:0249 E:0021e0 TOP    /bundle/gems/google-protobuf-3.6.1-x86_64-linux/lib/google/protobuf/any_pb.rb:13 [FINISH]
c:0049 p:---- s:0246 e:000245 CFUNC  :require
blowmage commented 5 years ago

@axalix, unfortunately yours looks like a separate issue. The segfault is from a different method, and the backtrace shows you are using a pre-compiled gem. You need to install the pure-ruby gem which will compile the C-code.

dirkjonker commented 5 years ago

@blowmage I'm having some issues trying the "fix" as I cannot seem to build V8 from scratch without some effort (unfortunately one of the dependencies is libv8 which will build V8 from source as well)

jarv commented 5 years ago

@axalix this works for me:

apk add build-base
gem install google-cloud-logging --platform ruby
dirkjonker commented 5 years ago

@blowmage I have some issues related to the grpc forking issues. I am loading the stackdriver gem conditionally and it seems to load some grpc component before forking. I'll have to investigate a bit further, I might open a new ticket for it.

blowmage commented 5 years ago

@dirkjonker As we’ve been investigating this and other issues, we noticed that the logger is set before the fork. This is not a problem unless something logs before the fork, as this causes grpc to be lazily loaded. We have a fix for this in #2733. We are in a release freeze, but you can try this code by changing your Gemfile to the following:

gem "stackdriver", github: "googleapis/google-cloud-ruby"
Viktor-Ivliev commented 5 years ago

Hello, I encountered the following problem for my tests and emulate it in rails (5.1.6) console : ruby 2.5.1

It 'failed' do
  (Time.now + 3.033.minutes).to_s
end

ruby-2.5.1/lib/ruby/2.5.0/pp.rb:288: [BUG] Segmentation fault at 0x0000000100000001

or

gems/ruby-2.5.1/gems/activesupport-5.1.6/lib/active_support/core_ext/time/conversions.rb:51:

  #[BUG] Segmentation fault at 0x0000000002000001

or

gems/ruby-2.5.1/gems/activesupport-5.1.6/lib/active_support/time_with_zone.rb:219:

  #[BUG] Segmentation fault at 0x000000006647491d
It 'seccess' do
  (Time.now + 3.minutes).to_s
end

Can someone faced a similar problem?

blowmage commented 5 years ago

@Viktor-Ivliev Can you open a new issue with all the relevant details?

dirkjonker commented 5 years ago

@blowmage thanks, the fix seems to work, just submitted #2742 to fix one issue I have with it

blowmage commented 5 years ago

@dirkjonker Glad to hear it.

@uiureo I have been unable to reproduce the exact segfault you have reported. I see that you are using Unicorn. Are you seeing the segfault pre or post-fork? Can you try updating to use the code in the repo and see if that helps?

uiur commented 5 years ago

I've tried the latest version of the gem, and it works perfectly in our environment with ruby 2.5.3 + rails 5.2.2. I don't run into this problem anymore.

Thanks!