Closed rporrasluc closed 3 years ago
I suspect that the error can have the origin in the number of workers and the pool size of the database. At the moment we are using 20 worker processes in unicorn.rb config and pool size 10 in database.yml config.
What version of ruby do you use? This may be related to the second issue in #74. Unicorn's process model is preforked single-threaded multi-process model according to http://unicorn.bogomips.org/PHILOSOPHY.html. I guess that rails work on the main thread only with unicorn. If so, the main thread is interrupted when a sub-thread, which was created by someone, terminates.
If you want to ignore interruption, change oci8_unblock_func
in ext/oci8/oci8lib.c as follows:
static void oci8_unblock_func(void *user_data)
{
/* comment out the following two lines. */
/* oci8_svcctx_t *svcctx = (oci8_svcctx_t *)user_data; */
/* OCIBreak(svcctx->base.hp.ptr, oci8_errhp); */
}
The downsides are:
OCI8#break
doesn't work.timeout
method in timeout.rb
doesn't cancel SQL processing on timeout.If this is same with the second issue of https://github.com/kubo/ruby-oci8/issues/74#issuecomment-100638168, the best way is applying the following patch to ruby itself instead of disabling interruption of ruby-oci8.
@rporrasluc how do you monitor your unicorn workers? And what do you do when a worker consumes too much memory? I recently came out to a conclusion that OCI breaks current statement execution if a process receives SIGQUIT (it is used by unicorn for graceful shutdown as well).
I am seeing a very similar issue, with OCIBreak
being raised in Resque workers when using the AWS S3 SDK (which is multi-threaded). I don't think it's the same issue as #74 though it does have a very similar smell. Specifically, if the issue was the same as the one fixed by https://github.com/ruby/ruby/commit/8ecd3b7114d6c0b81c0165dd0defa8f3df261d0b#diff-4fb69dc1bf667cfbc2b05dc5fd51e674, I would expect something like Thread.new { while true; sleep 10.years; end}
to fix the issue, as rb_thread_alone()
would always return false
, which should prevent the unblock function from being called. Even with a guard thread running, I still see OCIBreak
being raised when the last S3 thread exits while a SQL query is running.
For now I can work around the issue by disabling non-blocking mode in OCI8.
@kubo any ideas?
Could you check the guard thread's exception just in case?
Thread.new {
begin
while true; sleep 10.years; end
rescue Exception => exc
puts exc # or output to application log
end
}
Integer#years
is not a standard method. It is defined by active_support/core_ext/integer/time.rb
. It may be unavailable there.
If the guard thread doesn't stop, could you make minimum code to reproduce the issue? I cannot fix it without reproducing it.
I am seeing a similar problem in an application that has multiple threads. In my case the OCIBreak is created by ruby-oci8 when a thread executing a query is told to run by another thread. This is easily reproduced with:
conn = OCI8.new($dbuser, $dbpass, $dbname)
th = Thread.start do
begin
sleep 30
puts 'here'
conn.exec("BEGIN DBMS_LOCK.SLEEP(30); END;")
rescue OCIBreak
puts 'OCIBreak'
end
end
sleep 1
th.run
sleep 1
th.run
This results in:
here
OCIBreak
The same thing happens if the Thread.wakeup method is used instead.
It appears that this caused because the function rb_threadptr_interrupt
is called by the function rb_threadptr_ready
in Ruby's thread.c.
To workaround this problem I've changed our code to no longer call Thread.run/wakeup, but this has a slightly undesirable side effect because elsewhere in the thread it sleeps and I'd like to be able to have another thread interrupt the sleep. Now the first thread only resumes once the sleep completes.
I noticed also that the ruby-pg folks reverted their use of a UBF to cancel queries because there were problems using it with signal handlers. See: https://groups.google.com/forum/#!topic/ruby-pg/5_ylGmog1S4
This can also be reproduced with ruby-oci8:
conn = OCI8.new($dbuser, $dbpass, $dbname)
signal_received = false
trap 'USR1' do
puts 'got USR1'
signal_received = true
end
th = Thread.start do
sleep 5
Process.kill("USR1", Process.pid)
end
begin
conn.exec("BEGIN DBMS_LOCK.SLEEP(30); END;")
rescue OCIBreak
puts 'OCIBreak'
end
Which results in:
got USR1
OCIBreak
Both problems happen using either Ruby 2.2 or 2.3.
Hopefully this helps to diagnose the problem for others!
@awaltman Thank you! It is very helpful information!
(This comment was updated to add the third condition which triggers OCIBreak
.)
You code doesn't generate same results for me. It just printed here
only.
It gets same results by adding sleep 1
at the end of the script to make time printing OCIBreak
.
I know that Thread#wakeup
causes OCIBreak
but I haven't noticed Thread#run
and that the methods are not what I expected.
When Thread#run
wakes up a thread,
sleep
.IO#read
.OCIBreak
if the thread is slept in OCI methods.When a thread wakes up by Thread#run
, the thread checks current thread status and decides next action. However there is no way to know the thread status outside of ruby core.
Could you use ConditionVariable#wait and ConditionVariable#signal instead of sleep
and Thread#run
respectively in your application?
require 'thread'
mutex = Mutex.new
condvar = ConditionVariable.new
conn = OCI8.new($dbuser, $dbpass, $dbname)
th = Thread.start do
begin
mutex.synchronize { condvar.wait(mutex, 30) }
puts 'here'
conn.exec("BEGIN DBMS_LOCK.SLEEP(30); END;")
rescue OCIBreak
puts 'OCIBreak'
end
end
sleep 1
mutex.synchronize { condvar.signal }
sleep 1
mutex.synchronize { condvar.signal }
sleep 1
Thanks. I had not known that case.
When a ruby process receives a signal, the main thread wakes up, executes the signal trap block and re-executes the current code.
There is a workaround as follows. If OCI methods are executed in subthreads, they aren't cancelled by signal handlers.
conn = OCI8.new($dbuser, $dbpass, $dbname)
signal_received = false
trap 'USR1' do
puts 'got USR1'
signal_received = true
end
th = Thread.start do
sleep 5
Process.kill("USR1", Process.pid)
end
th2 = Thread.start do
begin
conn.exec("BEGIN DBMS_LOCK.SLEEP(30); END;")
rescue OCIBreak
puts 'OCIBreak'
end
end
th2.join
See the second issue of https://github.com/kubo/ruby-oci8/issues/74#issuecomment-100638168.
When a sub-thread exits and the main thread become only live thread, the main thread is interrupted and SQL executions in the main thread are cancelled. This was fixed in ruby 2.3.0. https://github.com/ruby/ruby/commit/8ecd3b7114d6c0b81c0165dd0defa8f3df261d0b
If the ruby version is less than 2.3.0, add the following code for workaround to ensure that one or more sub-threads live.
Thread.start do
loop do
sleep
end
end
There is trade off between above issues and statement cancellation.
If statement cancellation is supported,
OCIBreak
whereas other ruby methods don't raise an exception.OCIBreak
if it runs in the main thread.OCIBreak
. (This was fixed in ruby 2.3.0.)If statement cancellation isn't supported,
I prefer statement cancellation as long as there are workarounds. However it depends on applications. Should I add a parameter which enables and disables statement cancellation?
The previous comment was updated to add the third condition which triggers OCIBreak
.
Good morning lads,
I have been getting this error for a while in a production server.
Could anyone please provide some feedback or have some idea of the origin of the issue?
We are using Unicorn, Rails 4.2 and activerecord-oracle_enhanced-adapter 1.6.0.
Very much appreciated.