kubo / ruby-oci8

Ruby-oci8 - Oracle interface for ruby
Other
169 stars 75 forks source link

Getting OCIBreak: Canceled by user request #86

Closed rporrasluc closed 3 years ago

rporrasluc commented 9 years ago

Good morning lads,

I have been getting this error for a while in a production server.

ruby-oci8-2.1.8/lib/oci8/cursor.rb:538→ __fetch
ruby-oci8-2.1.8/lib/oci8/cursor.rb:538→ fetch_one_row_as_array
ruby-oci8-2.1.8/lib/oci8/cursor.rb:154→ fetch
activerecord-oracle_enhanced-adapter-1.6.1/lib/active_record/connection_adapters/oracle_enhanced/oci_connection.rb:163→ fetch
activerecord-oracle_enhanced-adapter-1.6.1/lib/active_record/connection_adapters/oracle_enhanced/database_statements.rb:53→ block in exec_query
activerecord-4.2.0/lib/active_record/connection_adapters/abstract_adapter.rb:466→ block in log
activesupport-4.2.0/lib/active_support/notifications/instrumenter.rb:20→ instrument
activerecord-4.2.0/lib/active_record/connection_adapters/abstract_adapter.rb:460→ log
activerecord-oracle_enhanced-adapter-1.6.1/lib/active_record/connection_adapters/oracle_enhanced_adapter.rb:1348→ log
activerecord-oracle_enhanced-adapter-1.6.1/lib/active_record/connection_adapters/oracle_enhanced/database_statements.rb:23→ exec_query
activerecord-oracle_enhanced-adapter-1.6.1/lib/active_record/connection_adapters/oracle_enhanced_adapter.rb:1308→ select
activerecord-4.2.0/lib/active_record/connection_adapters/abstract/database_statements.rb:32→ select_all
activerecord-4.2.0/lib/active_record/connection_adapters/abstract/query_cache.rb:68→ block in select_all
activerecord-4.2.0/lib/active_record/connection_adapters/abstract/query_cache.rb:83→ cache_sql
activerecord-4.2.0/lib/active_record/connection_adapters/abstract/query_cache.rb:68→ select_all
activerecord-oracle_enhanced-adapter-1.6.1/lib/active_record/connection_adapters/oracle_enhanced_adapter.rb:1050→ columns_without_cache
activerecord-oracle_enhanced-adapter-1.6.1/lib/active_record/connection_adapters/oracle_enhanced_adapter.rb:1017→ columns
activerecord-4.2.0/lib/active_record/connection_adapters/schema_cache.rb:43→ columns
activerecord-4.2.0/lib/active_record/attributes.rb:91→ columns
activerecord-4.2.0/lib/active_record/attributes.rb:96→ columns_hash
activerecord-4.2.0/lib/active_record/relation/query_methods.rb:968→ block in create_binds
activerecord-4.2.0/lib/active_record/relation/query_methods.rb:965→ each
activerecord-4.2.0/lib/active_record/relation/query_methods.rb:965→ partition
activerecord-4.2.0/lib/active_record/relation/query_methods.rb:965→ create_binds
activerecord-4.2.0/lib/active_record/relation/query_methods.rb:952→ build_where
activerecord-4.2.0/lib/active_record/relation/query_methods.rb:584→ where!
activerecord-4.2.0/lib/active_record/relation/query_methods.rb:574→ where

Application code...

activesupport-4.2.0/lib/active_support/callbacks.rb:427→ block in make_lambda
activesupport-4.2.0/lib/active_support/callbacks.rb:163→ call
activesupport-4.2.0/lib/active_support/callbacks.rb:163→ block in halting
activesupport-4.2.0/lib/active_support/callbacks.rb:92→ call
activesupport-4.2.0/lib/active_support/callbacks.rb:92→ _run_callbacks
activesupport-4.2.0/lib/active_support/callbacks.rb:734→ _run_process_action_callbacks
activesupport-4.2.0/lib/active_support/callbacks.rb:81→ run_callbacks
actionpack-4.2.0/lib/abstract_controller/callbacks.rb:19→ process_action
actionpack-4.2.0/lib/action_controller/metal/rescue.rb:29→ process_action
actionpack-4.2.0/lib/action_controller/metal/instrumentation.rb:31→ block in process_action
activesupport-4.2.0/lib/active_support/notifications.rb:164→ block in instrument
activesupport-4.2.0/lib/active_support/notifications/instrumenter.rb:20→ instrument
activesupport-4.2.0/lib/active_support/notifications.rb:164→ instrument
actionpack-4.2.0/lib/action_controller/metal/instrumentation.rb:30→ process_action
actionpack-4.2.0/lib/action_controller/metal/params_wrapper.rb:250→ process_action
activerecord-4.2.0/lib/active_record/railties/controller_runtime.rb:18→ process_action
actionpack-4.2.0/lib/abstract_controller/base.rb:137→ process
actionview-4.2.0/lib/action_view/rendering.rb:30→ process
actionpack-4.2.0/lib/action_controller/metal.rb:195→ dispatch
actionpack-4.2.0/lib/action_controller/metal/rack_delegation.rb:13→ dispatch
actionpack-4.2.0/lib/action_controller/metal.rb:236→ block in action
actionpack-4.2.0/lib/action_dispatch/routing/route_set.rb:73→ call
actionpack-4.2.0/lib/action_dispatch/routing/route_set.rb:73→ dispatch
actionpack-4.2.0/lib/action_dispatch/routing/route_set.rb:42→ serve
actionpack-4.2.0/lib/action_dispatch/journey/router.rb:43→ block in serve
actionpack-4.2.0/lib/action_dispatch/journey/router.rb:30→ each
actionpack-4.2.0/lib/action_dispatch/journey/router.rb:30→ serve
actionpack-4.2.0/lib/action_dispatch/routing/route_set.rb:802→ call
railties-4.2.0/lib/rails/engine.rb:518→ call
railties-4.2.0/lib/rails/railtie.rb:194→ public_send
railties-4.2.0/lib/rails/railtie.rb:194→ method_missing
actionpack-4.2.0/lib/action_dispatch/routing/mapper.rb:51→ serve
actionpack-4.2.0/lib/action_dispatch/journey/router.rb:43→ block in serve
actionpack-4.2.0/lib/action_dispatch/journey/router.rb:30→ each
actionpack-4.2.0/lib/action_dispatch/journey/router.rb:30→ serve
actionpack-4.2.0/lib/action_dispatch/routing/route_set.rb:802→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
newrelic_rpm-3.12.0.288/lib/new_relic/rack/agent_hooks.rb:30→ traced_call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
newrelic_rpm-3.12.0.288/lib/new_relic/rack/browser_monitoring.rb:32→ traced_call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
rack-1.6.4/lib/rack/etag.rb:24→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
rack-1.6.4/lib/rack/conditionalget.rb:25→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
rack-1.6.4/lib/rack/head.rb:13→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
actionpack-4.2.0/lib/action_dispatch/middleware/params_parser.rb:27→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
actionpack-4.2.0/lib/action_dispatch/middleware/flash.rb:260→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
rack-1.6.4/lib/rack/session/abstract/id.rb:225→ context
rack-1.6.4/lib/rack/session/abstract/id.rb:220→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
actionpack-4.2.0/lib/action_dispatch/middleware/cookies.rb:560→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
activerecord-4.2.0/lib/active_record/query_cache.rb:36→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
activerecord-4.2.0/lib/active_record/connection_adapters/abstract/connection_pool.rb:647→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
actionpack-4.2.0/lib/action_dispatch/middleware/callbacks.rb:29→ block in call
activesupport-4.2.0/lib/active_support/callbacks.rb:88→ call
activesupport-4.2.0/lib/active_support/callbacks.rb:88→ _run_callbacks
activesupport-4.2.0/lib/active_support/callbacks.rb:734→ _run_call_callbacks
activesupport-4.2.0/lib/active_support/callbacks.rb:81→ run_callbacks
actionpack-4.2.0/lib/action_dispatch/middleware/callbacks.rb:27→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
actionpack-4.2.0/lib/action_dispatch/middleware/remote_ip.rb:78→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
actionpack-4.2.0/lib/action_dispatch/middleware/debug_exceptions.rb:17→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
actionpack-4.2.0/lib/action_dispatch/middleware/show_exceptions.rb:30→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
railties-4.2.0/lib/rails/rack/logger.rb:38→ call_app
railties-4.2.0/lib/rails/rack/logger.rb:20→ block in call
activesupport-4.2.0/lib/active_support/tagged_logging.rb:68→ block in tagged
activesupport-4.2.0/lib/active_support/tagged_logging.rb:26→ tagged
activesupport-4.2.0/lib/active_support/tagged_logging.rb:68→ tagged
railties-4.2.0/lib/rails/rack/logger.rb:20→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
actionpack-4.2.0/lib/action_dispatch/middleware/request_id.rb:21→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
rack-1.6.4/lib/rack/methodoverride.rb:22→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
rack-1.6.4/lib/rack/runtime.rb:18→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
rack-1.6.4/lib/rack/sendfile.rb:113→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
railties-4.2.0/lib/rails/engine.rb:518→ call
railties-4.2.0/lib/rails/application.rb:164→ call
newrelic_rpm-3.12.0.288/lib/new_relic/agent/instrumentation/middleware_tracing.rb:67→ call
unicorn-4.9.0/lib/unicorn/http_server.rb:580→ process_client
unicorn-4.9.0/lib/unicorn/http_server.rb:674→ worker_loop
unicorn-4.9.0/lib/unicorn/http_server.rb:529→ spawn_missing_workers
unicorn-4.9.0/lib/unicorn/http_server.rb:540→ maintain_worker_count
unicorn-4.9.0/lib/unicorn/http_server.rb:294→ join
unicorn-4.9.0/bin/unicorn:126→ <top (required)>
/usr/local/bin/unicorn:23→ load
/usr/local/bin/unicorn:23→ <main>

Could anyone please provide some feedback or have some idea of the origin of the issue?

We are using Unicorn, Rails 4.2 and activerecord-oracle_enhanced-adapter 1.6.0.

Very much appreciated.

rporrasluc commented 9 years ago

I suspect that the error can have the origin in the number of workers and the pool size of the database. At the moment we are using 20 worker processes in unicorn.rb config and pool size 10 in database.yml config.

kubo commented 9 years ago

What version of ruby do you use? This may be related to the second issue in #74. Unicorn's process model is preforked single-threaded multi-process model according to http://unicorn.bogomips.org/PHILOSOPHY.html. I guess that rails work on the main thread only with unicorn. If so, the main thread is interrupted when a sub-thread, which was created by someone, terminates.

If you want to ignore interruption, change oci8_unblock_func in ext/oci8/oci8lib.c as follows:

static void oci8_unblock_func(void *user_data)
{
   /* comment out the following two lines. */
   /* oci8_svcctx_t *svcctx = (oci8_svcctx_t *)user_data; */
   /* OCIBreak(svcctx->base.hp.ptr, oci8_errhp); */
}

The downsides are:

kubo commented 9 years ago

If this is same with the second issue of https://github.com/kubo/ruby-oci8/issues/74#issuecomment-100638168, the best way is applying the following patch to ruby itself instead of disabling interruption of ruby-oci8.

https://github.com/ruby/ruby/commit/8ecd3b7114d6c0b81c0165dd0defa8f3df261d0b#diff-4fb69dc1bf667cfbc2b05dc5fd51e674

flash-gordon commented 9 years ago

@rporrasluc how do you monitor your unicorn workers? And what do you do when a worker consumes too much memory? I recently came out to a conclusion that OCI breaks current statement execution if a process receives SIGQUIT (it is used by unicorn for graceful shutdown as well).

holetse commented 9 years ago

I am seeing a very similar issue, with OCIBreak being raised in Resque workers when using the AWS S3 SDK (which is multi-threaded). I don't think it's the same issue as #74 though it does have a very similar smell. Specifically, if the issue was the same as the one fixed by https://github.com/ruby/ruby/commit/8ecd3b7114d6c0b81c0165dd0defa8f3df261d0b#diff-4fb69dc1bf667cfbc2b05dc5fd51e674, I would expect something like Thread.new { while true; sleep 10.years; end} to fix the issue, as rb_thread_alone() would always return false, which should prevent the unblock function from being called. Even with a guard thread running, I still see OCIBreak being raised when the last S3 thread exits while a SQL query is running.

For now I can work around the issue by disabling non-blocking mode in OCI8.

@kubo any ideas?

kubo commented 9 years ago

Could you check the guard thread's exception just in case?

Thread.new {
  begin
    while true; sleep 10.years; end
  rescue Exception => exc
    puts exc  # or output to application log
  end
}

Integer#years is not a standard method. It is defined by active_support/core_ext/integer/time.rb. It may be unavailable there.

If the guard thread doesn't stop, could you make minimum code to reproduce the issue? I cannot fix it without reproducing it.

awaltman commented 8 years ago

I am seeing a similar problem in an application that has multiple threads. In my case the OCIBreak is created by ruby-oci8 when a thread executing a query is told to run by another thread. This is easily reproduced with:

conn = OCI8.new($dbuser, $dbpass, $dbname)
th = Thread.start do
  begin
    sleep 30
    puts 'here'
    conn.exec("BEGIN DBMS_LOCK.SLEEP(30); END;")
  rescue OCIBreak
    puts 'OCIBreak'
  end
end
sleep 1
th.run
sleep 1
th.run

This results in:

here
OCIBreak

The same thing happens if the Thread.wakeup method is used instead.

It appears that this caused because the function rb_threadptr_interrupt is called by the function rb_threadptr_ready in Ruby's thread.c.

To workaround this problem I've changed our code to no longer call Thread.run/wakeup, but this has a slightly undesirable side effect because elsewhere in the thread it sleeps and I'd like to be able to have another thread interrupt the sleep. Now the first thread only resumes once the sleep completes.

I noticed also that the ruby-pg folks reverted their use of a UBF to cancel queries because there were problems using it with signal handlers. See: https://groups.google.com/forum/#!topic/ruby-pg/5_ylGmog1S4

This can also be reproduced with ruby-oci8:

conn = OCI8.new($dbuser, $dbpass, $dbname)
signal_received = false
trap 'USR1' do
  puts 'got USR1'
  signal_received = true
end
th = Thread.start do
  sleep 5
  Process.kill("USR1", Process.pid)
end
begin
  conn.exec("BEGIN DBMS_LOCK.SLEEP(30); END;")
rescue OCIBreak
  puts 'OCIBreak'
end

Which results in:

got USR1
OCIBreak

Both problems happen using either Ruby 2.2 or 2.3.

Hopefully this helps to diagnose the problem for others!

kubo commented 8 years ago

@awaltman Thank you! It is very helpful information!

(This comment was updated to add the third condition which triggers OCIBreak.)

1. Thread#run and Thread#wakeup

You code doesn't generate same results for me. It just printed here only. It gets same results by adding sleep 1 at the end of the script to make time printing OCIBreak.

I know that Thread#wakeup causes OCIBreak but I haven't noticed Thread#run and that the methods are not what I expected. When Thread#run wakes up a thread,

When a thread wakes up by Thread#run, the thread checks current thread status and decides next action. However there is no way to know the thread status outside of ruby core.

Could you use ConditionVariable#wait and ConditionVariable#signal instead of sleep and Thread#run respectively in your application?

require 'thread'

mutex = Mutex.new
condvar = ConditionVariable.new

conn = OCI8.new($dbuser, $dbpass, $dbname)
th = Thread.start do
  begin
    mutex.synchronize { condvar.wait(mutex, 30) }
    puts 'here'
    conn.exec("BEGIN DBMS_LOCK.SLEEP(30); END;")
  rescue OCIBreak
    puts 'OCIBreak'
  end
end
sleep 1
mutex.synchronize { condvar.signal }
sleep 1
mutex.synchronize { condvar.signal }
sleep 1

2. Signal Trap

Thanks. I had not known that case.

When a ruby process receives a signal, the main thread wakes up, executes the signal trap block and re-executes the current code.

There is a workaround as follows. If OCI methods are executed in subthreads, they aren't cancelled by signal handlers.

conn = OCI8.new($dbuser, $dbpass, $dbname)
signal_received = false
trap 'USR1' do
  puts 'got USR1'
  signal_received = true
end
th = Thread.start do
  sleep 5
  Process.kill("USR1", Process.pid)
end
th2 = Thread.start do
  begin
    conn.exec("BEGIN DBMS_LOCK.SLEEP(30); END;")
  rescue OCIBreak
    puts 'OCIBreak'
  end
end
th2.join

3. When a sub-thread exits and the main thread become only live thread

See the second issue of https://github.com/kubo/ruby-oci8/issues/74#issuecomment-100638168.

When a sub-thread exits and the main thread become only live thread, the main thread is interrupted and SQL executions in the main thread are cancelled. This was fixed in ruby 2.3.0. https://github.com/ruby/ruby/commit/8ecd3b7114d6c0b81c0165dd0defa8f3df261d0b

If the ruby version is less than 2.3.0, add the following code for workaround to ensure that one or more sub-threads live.

Thread.start do
   loop do
      sleep
   end
end

4. Summary

There is trade off between above issues and statement cancellation.

If statement cancellation is supported,

If statement cancellation isn't supported,

I prefer statement cancellation as long as there are workarounds. However it depends on applications. Should I add a parameter which enables and disables statement cancellation?

kubo commented 8 years ago

The previous comment was updated to add the third condition which triggers OCIBreak.