kubo / ruby-oci8

Ruby-oci8 - Oracle interface for ruby
Other
169 stars 75 forks source link

OCI8.new hangs intermittently #45

Closed andyioannou closed 8 years ago

andyioannou commented 11 years ago

Experiencing an intermittent issue when executing OCI8.new()

I can (intermittently) reproduce in the rails console bash-3.2$ rails c Loading development environment (Rails 3.0.9) 1.9.3-p448 :001 > OCI8.new()

or from the command line $ruby -rubygems -e "require 'oci8'; OCI8.new()

in either case the command hangs and simple does not return (I have to kill the process)

Here is some info on versions rails: 3.0.9 ruby: 1.9.3p448 ruby-oci8: 2.1.0 Oracle Instant Client: 12.1 64bit

If I kill the process and try OCI8.new() again it will often start working and return the connection, which then works perfectly (.exec works etc). After OCI8.new has worked once I can continue to create new connections without problem. Then ... at some point in the future .. usually after being idle for some time the same issue will be encountered (a new OCI8.new() fails to return)

I have tested the target machine hosting the db is accessible (ping / traceroute etc) during the times when OCI8 hangs. Also I can call OCI8.new with the wrong login details and it will come back with an authentication error, but a call with the correct login details straight after can hang.

So 1) are there any know incompatibilities between the oci8 version and other components listed 2) How would you suggest debugging this? I am trying to establish what is going on at the point of hanging. Is it in OCI8 or is this simply a network issue (seems unlikely to be purely network because I can always get the db to give me an authentication error - it never fails to return). 3) Is there some way of configuring OCI8 to 'timeout' if it is waiting for something (I don't know this is the problem - but it could be.

Thanks for your help

kubo commented 10 years ago

There are some timeout parameters in sqlnet.ora and tnsnames.ora. http://docs.oracle.com/cd/E11882_01/network.112/e10835/sqlnet.htm http://docs.oracle.com/cd/E11882_01/network.112/e10835/tnsnames.htm#NETRF1434

If you use a full client, sqlnet.ora and tnsnames.ora will be in $ORACLE_HOME/network/admin/. If you use an instant client, create sqlnet.ora and tnsnames.ora and put them to a directory and set the environment variable TNS_ADMIN to point to the directory.

MentalPower commented 10 years ago

I'm having a similar issue, but with fetch_hash instead. I'm querying a DB across a rather slow fractional T1 (don't ask) so everything is slower. However every once in a while fetch_hash() just fails to return at all.

The worst part about it is that even with Thread#kill or Timeout::timeout (or just plain SIGINT) the ruby process will not exit cleanly and has to be kill -9'd from the outside. I cannot use tnsnames.ora since I don't have access to modify it from my code, so I'm using a plain OCI8.new('user', 'pass', "host") construct.

Any ideas?

kubo commented 10 years ago

What OS do you use? If you use Linux, could you send the output of the following command?

$ gdb `which ruby` "process-id-of-hung-process"
(gdb) info threads
  Id   Target Id         Frame 
  3    Thread 0x7ffff5b6d700  ....
  2    Thread 0x7ffff7ff7700  ....
* 1    Thread 0x7ffff7fd5700 ....
(gdb) thread 1
(gdb) bt
...
(gdb) thread 2
(gdb) bt
...
(gdb) thread 3
(gdb) bt
...
(gdb) thread ...(for each ID)
(gdb) bt

I want to confirm that it hung on running read() system call.

MentalPower commented 10 years ago

Sorry this took so long, I was out of town this past week. I use both OSX (10.9.2) and CentOS 6.5, it happens on both. Below is the info you asked for, it looks to definitely hang on the read() call on thread 2.

$ ruby -v
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux]
$ ps aux | grep ruby
xxxxx     8510  5.3  3.1 751964 122112 pts/2   Sl+  03:18   0:02 ruby /usr/local/rbenv/versions/2.1.1/bin/rake someRake:taskName
xxxxx     8621  0.0  0.0 103244   840 pts/3    S+   03:19   0:00 grep ruby
$ rbenv which ruby
/usr/local/rbenv/versions/2.1.1/bin/ruby
$ gdb `rbenv which ruby` "8510"
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/rbenv/versions/2.1.1/bin/ruby...done.
Attaching to program: /usr/local/rbenv/versions/2.1.1/bin/ruby, process 8510
Reading symbols from /lib64/libpthread.so.0...Reading symbols from /usr/lib/debug/lib64/libpthread-2.12.so.debug...done.
[New LWP 8551]
[New LWP 8548]
[Thread debugging using libthread_db enabled]
done.

#################### More stuff about loading symbols snipped off ####################

pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
183 62: movl    (%rsp), %edi
Missing separate debuginfos, use: debuginfo-install libaio-0.3.107-10.el6.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64
(gdb) info threads 
  3 Thread 0x7f91e8b78700 (LWP 8548)  0x00007f91e7bc2343 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
  2 Thread 0x7f91cd01d700 (LWP 8551)  0x00007f91e874c75d in read () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 0x7f91e8b69700 (LWP 8510)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f91e8b69700 (LWP 8510))]#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
183 62: movl    (%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
#1  0x00007f91e8ce7e79 in native_cond_wait (cond=<value optimized out>, mutex=<value optimized out>) at thread_pthread.c:334
#2  0x00007f91e8ceeb83 in native_sleep (th=0x7f91ea4705b0, timeout_tv=0x0) at thread_pthread.c:1059
#3  0x00007f91e8cef7ab in rb_thread_terminate_all () at thread.c:447
#4  0x00007f91e8ba2423 in ruby_cleanup (ex=6) at eval.c:188
#5  0x00007f91e8ba252d in ruby_run_node (n=<value optimized out>) at eval.c:310
#6  0x00007f91e8b9ecbb in main (argc=3, argv=0x7fff1e1e44b8) at main.c:36
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f91cd01d700 (LWP 8551))]#0  0x00007f91e874c75d in read () at ../sysdeps/unix/syscall-template.S:82
82  T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x00007f91e874c75d in read () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f91d9691ac0 in snttread () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#2  0x00007f91d9690f35 in nttfprd () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#3  0x00007f91d968d219 in nsbasic_brc () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#4  0x00007f91d968d026 in nsbrecv () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#5  0x00007f91d9682d48 in nioqrc () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#6  0x00007f91d9697a39 in ttcdrv () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#7  0x00007f91d9687a59 in nioqwa () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#8  0x00007f91d96736cf in upirtrc () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#9  0x00007f91d967e1a6 in kpurcsc () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#10 0x00007f91d9676eb3 in kpuexec () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#11 0x00007f91d9674049 in OCIStmtExecute () from /usr/lib/oracle/12.1/client64/lib/libclntsh.so.12.1
#12 0x00007f91da299633 in oci8_OCIStmtExecute_cb (user_data=0x7f91cd01c5a0) at apiwrap.c:3113
#13 0x00007f91e8cee092 in call_without_gvl (func=0x7f91da299600 <oci8_OCIStmtExecute_cb>, data1=0x7f91cd01c5a0, ubf=<value optimized out>, data2=<value optimized out>, fail_if_interrupted=0) at thread.c:1263
#14 0x00007f91da283047 in protected_call (data=140264186430784) at oci8lib.c:251
#15 0x00007f91e8ba00ad in rb_protect (proc=0x7f91da283010 <protected_call>, data=140264186430784, state=0x7f91cd01c55c) at eval.c:819
#16 0x00007f91da282fce in oci8_call_without_gvl (svcctx=0x7f91c4000980, func=0x7f91da299600 <oci8_OCIStmtExecute_cb>, data=0x7f91cd01c5a0) at oci8lib.c:309
#17 0x00007f91da298b4c in oci8_OCIStmtExecute_nb (svcctx=<value optimized out>, svchp=<value optimized out>, stmtp=<value optimized out>, errhp=<value optimized out>, iters=<value optimized out>, rowoff=<value optimized out>, snap_in=0x0, snap_out=0x0, mode=0, file=0x7f91da29b835 "stmt.c", line=202)
    at apiwrap.c:3136
#18 0x00007f91da288dd1 in oci8_call_stmt_execute (self=140264102770000, iteration_count=1) at stmt.c:202
#19 oci8_stmt_execute (self=140264102770000, iteration_count=1) at stmt.c:228
#20 0x00007f91e8cc9bfa in vm_call_cfunc_with_frame (th=0x7f91f020c270, reg_cfp=0x7f91cd2c6e80, ci=<value optimized out>) at vm_insnhelper.c:1470
#21 0x00007f91e8cdf55b in vm_call_cfunc (th=0x7f91f020c270, cfp=0x7f91cd2c6e80, ci=0x7f91eea9a5e0) at vm_insnhelper.c:1560
#22 vm_call_method (th=0x7f91f020c270, cfp=0x7f91cd2c6e80, ci=0x7f91eea9a5e0) at vm_insnhelper.c:1754
#23 0x00007f91e8cd5681 in vm_exec_core (th=0x7f91f020c270, initial=<value optimized out>) at insns.def:1028
#24 0x00007f91e8cdaadc in vm_exec (th=0x7f91f020c270) at vm.c:1304
#25 0x00007f91e8cde58d in invoke_block_from_c (th=0x7f91f020c270, proc=0x7f91f020c7f0, self=140264677532680, defined_class=140264677537680, argc=0, argv=0x7f91f02007c8, blockptr=0x0) at vm.c:732
#26 vm_invoke_proc (th=0x7f91f020c270, proc=0x7f91f020c7f0, self=140264677532680, defined_class=140264677537680, argc=0, argv=0x7f91f02007c8, blockptr=0x0) at vm.c:788
#27 0x00007f91e8cde6ca in rb_vm_invoke_proc (th=<value optimized out>, proc=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>, blockptr=<value optimized out>) at vm.c:807
#28 0x00007f91e8ceff84 in thread_start_func_2 (th=0x7f91f020c270, stack_start=<value optimized out>) at thread.c:535
#29 0x00007f91e8cf0211 in thread_start_func_1 (th_ptr=0x7f91f020c270) at thread_pthread.c:803
#30 0x00007f91e87459d1 in start_thread (arg=0x7f91cd01d700) at pthread_create.c:301
#31 0x00007f91e7bcbb6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f91e8b78700 (LWP 8548))]#0  0x00007f91e7bc2343 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
87    int result = INLINE_SYSCALL (poll, 3, CHECK_N (fds, nfds), nfds, timeout);
(gdb) bt
#0  0x00007f91e7bc2343 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f91e8cec122 in timer_thread_sleep (p=0x7f91ea46ff28) at thread_pthread.c:1381
#2  thread_timer (p=0x7f91ea46ff28) at thread_pthread.c:1456
#3  0x00007f91e87459d1 in start_thread (arg=0x7f91e8b78700) at pthread_create.c:301
#4  0x00007f91e7bcbb6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) detach 
Detaching from program: /usr/local/rbenv/versions/2.1.1/bin/ruby, process 8510
(gdb) quit
$ 
kubo commented 10 years ago

Thank you! It is what I expected and I reproduced hang on read() by dropping packets with iptables on Linux.

I found that the Oracle client library reads not only $ORACLE_HOME/network/admin/sqlnet.ora but also $HOME/.sqlnet.ora on Linux.

Could you change your code as follows to temporarily change the environment variable HOME while requiring oci8 and put .sqlnet.ora in the same directory.

saved_home_env = ENV['HOME']
begin
   ENV['HOME'] = File.dirname(__FILE__) # temporarily change the environment variable HOME.
   require 'oci8'
ensure
   ENV['HOME'] = saved_home_env
end

.sqlnet.ora

SQLNET.OUTBOUND_CONNECT_TIMEOUT=20
SQLNET.RECV_TIMEOUT=20

Note that if a query needs time more than SQLNET.RECV_TIMEOUT, the connection is closed.

As for another issue "the ruby process will not exit cleanly and has to be kill -9'd from the outside" the process will exit after SQLNET.RECV_TIMEOUT.

kubo commented 8 years ago

The next ruby-oci8 release includes timeout parameters. See http://www.rubydoc.info/github/kubo/ruby-oci8/file/docs/timeout-parameters.md. Use these parameters for low-quality networks.