oracle / node-oracledb

Oracle Database driver for Node.js maintained by Oracle Corp.
http://oracle.github.io/node-oracledb/
Other
2.26k stars 1.08k forks source link

Pool closing memory corruption issue #1115

Closed hwesley closed 5 years ago

hwesley commented 5 years ago

Hi,

I am creating 2 connection pools, with different poolAlias & poolMax variables (one for edits, one for select queries), but otherwise they have the below configuration:

{
connectionString: `${process.env.DB_HOST}:${process.env.DB_PORT}/cdb1`,
  password: process.env.DB_PASSWORD,
  user: process.env.DB_USER,
  _enableStats: true,
 poolAlias: 'PoolName1',
  poolMin: 0,
  poolMax: 60,
  poolIncrement: 1,
  poolPingInterval: 60,
  poolTimeout: 30
}

I'm periodically pinging the database with a simple SELECT 1 FROM DUAL-query. When the ping fails, I want to recreate the 2 pools by calling close followed by createPool. The ping query times out after 15 seconds.

I am using version 3.1.2 of the node-oracledb library. I am also using the following client: instantclient-basic-linux.x64-18.3.0.0.0dbru.

When the database is available, this works as expected & the pings are successful.

However, when the database is not available (eg server is offline), my ping fails as expected. During the recreate, I get the following stacktrace after closing the pool. It's not possible to catch it with a try-catch, my methods complete & return as expected (I wait on the callbacks to check if it succeeded). It somehow prints the stacktrace from outside my code:

backend_1   | Pinging Oracle DB
backend_1   | Error connecting to db with PoolName1 pool: Timed out in 15000ms.
backend_1   | Pinging Oracle DB failed: Error: Pinging failed
backend_1   | Recreating pool: PoolName1
backend_1   | Destroy pool: PoolName1: 2019-06-26T12:48:15.066Z
backend_1   | Finished destroying pool: PoolName1: 2019-06-26T12:48:15.068Z
backend_1   | Finished creating pool: PoolName1
backend_1   | Finished recreating pool: PoolName1
backend_1   | Finished pinging

backend_1   | *** Error in `/usr/local/bin/node': malloc(): memory corruption: 0x00007fed000a6810 ***
backend_1   | ======= Backtrace: =========
backend_1   | /lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7fed306d2bfb]
backend_1   | /lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7fed306d8fc6]
backend_1   | /lib/x86_64-linux-gnu/libc.so.6(+0x79089)[0x7fed306db089]
backend_1   | /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7fed306dcf64]
backend_1   | /lib64/ld-linux-x86-64.so.2(+0xf4c1)[0x7fed318d64c1]
backend_1   | /lib64/ld-linux-x86-64.so.2(+0xf683)[0x7fed318d6683]
backend_1   | /lib64/ld-linux-x86-64.so.2(+0xa39e)[0x7fed318d139e]
backend_1   | /lib/x86_64-linux-gnu/libc.so.6(+0x11f2d0)[0x7fed307812d0]
backend_1   | /lib64/ld-linux-x86-64.so.2(+0xf704)[0x7fed318d6704]
backend_1   | /lib/x86_64-linux-gnu/libc.so.6(+0x11f66d)[0x7fed3078166d]
backend_1   | /lib/x86_64-linux-gnu/libdl.so.2(+0x1014)[0x7fed316c4014]
backend_1   | /lib64/ld-linux-x86-64.so.2(+0xf704)[0x7fed318d6704]
backend_1   | /lib/x86_64-linux-gnu/libdl.so.2(+0x1531)[0x7fed316c4531]
backend_1   | /lib/x86_64-linux-gnu/libdl.so.2(dlsym+0x48)[0x7fed316c4068]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(sskgds_save_text_start_end+0x9f)[0x7fed0f245daf]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(+0x76e46b)[0x7fed0c7fd46b]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(kgdsdsts+0x4e)[0x7fed0c7fd3ae]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(dbgemdFillIncCtx+0x17c)[0x7fed0c9cb6ec]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(dbgexPopulateIncCtx+0x79)[0x7fed0c9d2fa9]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(dbgexProcessError+0xd9)[0x7fed0c9d16a9]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(dbgePostErrorDirectVaList_int+0x872)[0x7fed0ec967e2]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(dbgePostErrorDirect+0x326)[0x7fed0ec958e6]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(+0x2d9667f)[0x7fed0ee2567f]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(+0x26fc280)[0x7fed0e78b280]
backend_1   | /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7fed30a120c0]
backend_1   | /lib/x86_64-linux-gnu/libpthread.so.0(+0xb193)[0x7fed30a0c193]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(kpuspgetpooledsession+0x50d)[0x7fed0d41dcdd]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(kpuspsessionget+0x9b1)[0x7fed0d4198b1]
backend_1   | /opt/oracle/instantclient_18_3/libclntsh.so(OCISessionGet+0x10c)[0x7fed0d17ce3c]
backend_1   | /app/node_modules/oracledb/build/Release/oracledb-abi67-linux-x64.node(dpiOci__sessionGet+0x6a)[0x7fed2dc2bd3a]
backend_1   | /app/node_modules/oracledb/build/Release/oracledb-abi67-linux-x64.node(dpiConn__create+0x418)[0x7fed2dc18ca8]
backend_1   | /app/node_modules/oracledb/build/Release/oracledb-abi67-linux-x64.node(dpiPool__acquireConnection+0x6d)[0x7fed2dc2f46d]
backend_1   | /app/node_modules/oracledb/build/Release/oracledb-abi67-linux-x64.node(dpiPool_acquireConnection+0x12d)[0x7fed2dc2f63d]
backend_1   | /app/node_modules/oracledb/build/Release/oracledb-abi67-linux-x64.node(_ZN7njsPool19Async_GetConnectionEP8njsBaton+0xaf)[0x7fed2dbfa11f]
backend_1   | /usr/local/bin/node[0xa79e4e]
backend_1   | /lib/x86_64-linux-gnu/libpthread.so.0(+0x7494)[0x7fed30a08494]
backend_1   | /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fed3074aacf]
backend_1   | ======= Memory map: ========
backend_1   | 00400000-02349000 r-xp 00000000 08:01 530123                             /usr/local/bin/node
backend_1   | 02549000-0254b000 r--p 01f49000 08:01 530123                             /usr/local/bin/node
backend_1   | 0254b000-02562000 rw-p 01f4b000 08:01 530123                             /usr/local/bin/node
backend_1   | 02562000-0257d000 rw-p 00000000 00:00 0 
backend_1   | 04570000-04ae4000 rw-p 00000000 00:00 0                                  [heap]
backend_1   | 43dc800000-43dc880000 rw-p 00000000 00:00 0 
backend_1   | 7bfd980000-7bfda00000 rw-p 00000000 00:00 0 
backend_1   | da5d680000-da5d700000 rw-p 00000000 00:00 0 
backend_1   | 18bcba00000-18bcba80000 rw-p 00000000 00:00 0 
backend_1   | 1c9c4d80000-1c9c4e00000 rw-p 00000000 00:00 0 
backend_1   | 1cf58e80000-1cf58f00000 ---p 00000000 00:00 0 
backend_1   | 1e9d1080000-1e9d1100000 rw-p 00000000 00:00 0 
backend_1   | 2e2ad200000-2e2ad280000 rw-p 00000000 00:00 0 
backend_1   | 3d106800000-3d106880000 ---p 00000000 00:00 0 
backend_1   | 40931b80000-40931c00000 rw-p 00000000 00:00 0 
backend_1   | 43405980000-43405a00000 rw-p 00000000 00:00 0 
backend_1   | 4c875d80000-4c875e00000 rw-p 00000000 00:00 0 
backend_1   | 53cc9200000-53cc9303000 rw-p 00000000 00:00 0 
backend_1   | 5b9fac80000-5b9fad00000 rw-p 00000000 00:00 0 
backend_1   | 5f7d2a00000-5f7d2a80000 rw-p 00000000 00:00 0 
backend_1   | 60b04000000-60b04080000 rw-p 00000000 00:00 0 
backend_1   | 6121ee80000-6121ef00000 ---p 00000000 00:00 0 
backend_1   | 66508500000-66508580000 rw-p 00000000 00:00 0 
backend_1   | 67571a80000-67571b00000 ---p 00000000 00:00 0 
backend_1   | 67ab4f80000-67ab5000000 ---p 00000000 00:00 0 
backend_1   | 6a5a8300000-6a5a8380000 rw-p 00000000 00:00 0 
backend_1   | 6b6ac080000-6b6ac100000 rw-p 00000000 00:00 0 
backend_1   | 6c8ef480000-6c8ef500000 rw-p 00000000 00:00 0 
backend_1   | 6e3ff580000-6e3ff600000 rw-p 00000000 00:00 0 
backend_1   | 741e4400000-741e4480000 rw-p 00000000 00:00 0 
backend_1   | 75a9be00000-75a9be80000 ---p 00000000 00:00 0 
backend_1   | 7a726e00000-7a726e80000 rw-p 00000000 00:00 0 
backend_1   | 7e9e5d00000-7e9e5d80000 rw-p 00000000 00:00 0 
backend_1   | 93177a00000-93177a80000 rw-p 00000000 00:00 0 
backend_1   | a17fe300000-a17fe380000 ---p 00000000 00:00 0 
backend_1   | a7b9e680000-a7b9e700000 rw-p 00000000 00:00 0 
backend_1   | b2447400000-b2447480000 rw-p 00000000 00:00 0 
backend_1   | b3461200000-b3461280000 ---p 00000000 00:00 0 
backend_1   | b5a31d80000-b5a31e00000 ---p 00000000 00:00 0 
backend_1   | bc2f1580000-bc2f1600000 rw-p 00000000 00:00 0 
backend_1   | c2decd00000-c2decd80000 ---p 00000000 00:00 0 
backend_1   | c8644980000-c8644a00000 rw-p 00000000 00:00 0 
backend_1   | e3bc7c80000-e3bc7d00000 rw-p 00000000 00:00 0 
backend_1   | e875dc00000-e875dc80000 ---p 00000000 00:00 0 
backend_1   | ecf1f200000-ecf1f280000 ---p 00000000 00:00 0 
backend_1   | f2c27f80000-f2c28000000 rw-p 00000000 00:00 0 
backend_1   | 1098d6380000-1098d6400000 ---p 00000000 00:00 0 
backend_1   | 10abb9580000-10abb9600000 rw-p 00000000 00:00 0 
backend_1   | 10d075f80000-10d076000000 ---p 00000000 00:00 0 
backend_1   | 116fc1100000-116fc1180000 ---p 00000000 00:00 0 
backend_1   | 11b3e2200000-11b3e2280000 rw-p 00000000 00:00 0 
backend_1   | 123fafd80000-123fafe00000 ---p 00000000 00:00 0 
backend_1   | 1253c5e00000-1253c5e80000 ---p 00000000 00:00 0 
backend_1   | 1376a9180000-1376a9200000 rw-p 00000000 00:00 0 
backend_1   | 13806bc00000-13806bc80000 ---p 00000000 00:00 0 
backend_1   | 13980f280000-13980f300000 ---p 00000000 00:00 0 
backend_1   | 15b246900000-15b246980000 ---p 00000000 00:00 0 
backend_1   | 173d62400000-173d62480000 ---p 00000000 00:00 0 
backend_1   | 17ad83680000-17ad83700000 rw-p 00000000 00:00 0 
backend_1   | 181dcb900000-181dcb980000 ---p 00000000 00:00 0 
backend_1   | 184a06300000-184a06380000 rw-p 00000000 00:00 0 
backend_1   | 185f87e00000-185f87f0a000 rw-p 00000000 00:00 0 
backend_1   | 18e358200000-18e358280000 ---p 00000000 00:00 0 
backend_1   | 194caab00000-194caab80000 rw-p 00000000 00:00 0 
backend_1   | 19a9a6180000-19a9a6200000 ---p 00000000 00:00 0 
backend_1   | 1aadb1680000-1aadb1700000 rw-p 00000000 00:00 0 
backend_1   | 1bdf9d800000-1bdf9d880000 rw-p 00000000 00:00 0 
backend_1   | 1bee4d200000-1bee4d280000 rw-p 00000000 00:00 0 
backend_1   | 1c4f0d080000-1c4f0d100000 rw-p 00000000 00:00 0 
backend_1   | 1cc711100000-1cc711180000 ---p 00000000 00:00 0 
backend_1   | 1e0727f80000-1e0728000000 ---p 00000000 00:00 0 
backend_1   | 1fcabc080000-1fcabc100000 ---p 00000000 00:00 0 
backend_1   | 204a77f00000-204a77f80000 rw-p 00000000 00:00 0 
backend_1   | 207285a00000-207285a80000 rw-p 00000000 00:00 0 
backend_1   | 208690e80000-208690f00000 ---p 00000000 00:00 0 
backend_1   | 218155600000-218155680000 rw-p 00000000 00:00 0 
backend_1   | 21a429180000-21a429200000 ---p 00000000 00:00 0 
backend_1   | 21ef9a500000-21ef9a580000 ---p 00000000 00:00 0 
backend_1   | 22148cb80000-22148cc00000 rw-p 00000000 00:00 0 
backend_1   | 22777bc80000-22777bd00000 rw-p 00000000 00:00 0 
backend_1   | 2338a5a00000-2338a5a80000 ---p 00000000 00:00 0 
backend_1   | 23cfd7280000-23cfd7300000 rw-p 00000000 00:00 0 
backend_1   | 24777b600000-24777b680000 rw-p 00000000 00:00 0 
backend_1   | 24ea95580000-24ea95600000 rw-p 00000000 00:00 0 
backend_1   | 260967b80000-260967c00000 rw-p 00000000 00:00 0 
backend_1   | 2652c0900000-2652c0980000 ---p 00000000 00:00 0 
backend_1   | 27722a700000-27722a780000 rw-p 00000000 00:00 0 
backend_1   | 27cf9d200000-27cf9d280000 rw-p 00000000 00:00 0 
backend_1   | 27f23fc80000-27f23fd00000 ---p 00000000 00:00 0 
backend_1   | 28cc19300000-28cc19380000 rw-p 00000000 00:00 0 
backend_1   | 292155e00000-292155e80000 ---p 00000000 00:00 0 
backend_1   | 2a4be09c9000-2a4be0a00000 ---p 00000000 00:00 0 
backend_1   | 2a4be0a00000-2a4be0a03000 rw-p 00000000 00:00 0 
backend_1   | 2a4be0a03000-2a4be0a04000 ---p 00000000 00:00 0 
backend_1   | 2a4be0a04000-2a4be0a7f000 r-xp 00000000 00:00 0 
backend_1   | 2a4be0a7f000-2a4be0a80000 ---p 00000000 00:00 0 
backend_1   | 2a4be0a80000-2a4be0a83000 rw-p 00000000 00:00 0 
backend_1   | 2a4be0a83000-2a4be0a84000 ---p 00000000 00:00 0 
backend_1   | 2a4be0a84000-2a4be0aff000 r-xp 00000000 00:00 0 
backend_1   | 2a4be0aff000-2a4be0b00000 ---p 00000000 00:00 0 
backend_1   | 2a4be0b00000-2a4be0b03000 rw-p 00000000 00:00 0 
backend_1   | 2a4be0b03000-2a4be0b04000 ---p 00000000 00:00 0 
backend_1   | 2a4be0b04000-2a4be0b19000 r-xp 00000000 00:00 0 
backend_1   | 2a4be0b19000-2a4be0b80000 ---p 00000000 00:00 0 
backend_1   | 2a4be0b80000-2a4be0b83000 rw-p 00000000 00:00 0 
backend_1   | 2a4be0b83000-2a4be0b84000 ---p 00000000 00:00 0 
backend_1   | 2a4be0b84000-2a4be0b99000 r-xp 00000000 00:00 0 
backend_1   | 2a4be0b99000-2a4be0c00000 ---p 00000000 00:00 0 
backend_1   | 2a4be0c00000-2a4be0c03000 rw-p 00000000 00:00 0 
backend_1   | 2a4be0c03000-2a4be0c04000 ---p 00000000 00:00 0 
backend_1   | 2a4be0c04000-2a4be0c19000 r-xp 00000000 00:00 0 
backend_1   | 2a4be0c19000-2a4be0c80000 ---p 00000000 00:00 0 
backend_1   | 2a4be0c80000-2a4be0c83000 rw-p 00000000 00:00 0 
backend_1   | 2a4be0c83000-2a4be0c84000 ---p 00000000 00:00 0 
backend_1   | 2a4be0c84000-2a4be0cff000 r-xp 00000000 00:00 0 
backend_1   | 2a4be0cff000-2a4be89c9000 ---p 00000000 00:00 0 
backend_1   | 2a718d000000-2a718d080000 rw-p 00000000 00:00 0 
backend_1   | 2b699c980000-2b699ca00000 ---p 00000000 00:00 0 
backend_1   | 2c1d46a00000-2c1d46a80000 rw-p 00000000 00:00 0 
backend_1   | 2d892d024000-2d892d02c000 rw-p 00000000 00:00 0 
backend_1   | 2dc7b6e00000-2dc7b6e80000 ---p 00000000 00:00 0 
backend_1   | 2e8844800000-2e8844880000 rw-p 00000000 00:00 0 
backend_1   | 2ee068200000-2ee068280000 rw-p 00000000 00:00 0 
backend_1   | 2f0733900000-2f0733980000 rw-p 00000000 00:00 0 
backend_1   | 2f8117f00000-2f8117f80000 ---p 00000000 00:00 0 
backend_1   | 2fdabfa00000-2fdabfa80000 ---p 00000000 00:00 0 
backend_1   | 2ff1d7e80000-2ff1d7e83000 rw-p 00000000 00:00 0 
backend_1   | 309314280000-309314300000 rw-p 00000000 00:00 0 
backend_1   | 310c39e00000-310c39e80000 ---p 00000000 00:00 0 
backend_1   | 31a5d3900000-31a5d3980000 ---p 00000000 00:00 0 
backend_1   | 324325180000-324325200000 ---p 00000000 00:00 0 
backend_1   | 325d4d680000-325d4d700000 rw-p 00000000 00:00 0 
backend_1   | 32970de80000-32970df00000 ---p 00000000 00:00 0 
backend_1   | 32c0b1a00000-32c0b1a80000 ---p 00000000 00:00 0 
backend_1   | 33120f180000-33120f200000 rw-p 00000000 00:00 0 
backend_1   | 336de5a00000-336de5a80000 ---p 00000000 00:00 0 
backend_1   | 338466f00000-338466f80000 ---p 00000000 00:00 0 
backend_1   | 33a5c9600000-33a5c9680000 rw-p 00000000 00:00 0 
backend_1   | 33aadd900000-33aadd980000 rw-p 00000000 00:00 0 
backend_1   | 33ba32280000-33ba32300000 ---p 00000000 00:00 0 
backend_1   | 33d608780000-33d6087b1000 rw-p 00000000 00:00 0 
backend_1   | 341f4ee80000-341f4ef00000 rw-p 00000000 00:00 0 
backend_1   | 348ed6780000-348ed6800000 rw-p 00000000 00:00 0 
backend_1   | 34a5dfc80000-34a5dfd00000 ---p 00000000 00:00 0 
backend_1   | 35424fd00000-35424fd80000 rw-p 00000000 00:00 0 
backend_1   | 355747f80000-355748000000 ---p 00000000 00:00 0 
backend_1   | 35895bf00000-35895bf80000 ---p 00000000 00:00 0 
backend_1   | 358aa6600000-358aa6680000 ---p 00000000 00:00 0 
backend_1   | 35bc95200000-35bc95280000 ---p 00000000 00:00 0 
backend_1   | 361eee400000-361eee480000 rw-p 00000000 00:00 0 
backend_1   | 364dfaf80000-364dfb000000 ---p 00000000 00:00 0 
backend_1   | 36d9aa900000-36d9aa903000 rw-p 00000000 00:00 0 
backend_1   | 36d9aa903000-36d9aa980000 r--p 00000000 00:00 0 
backend_1   | 371bf2480000-371bf2500000 ---p 00000000 00:00 0 
backend_1   | 379188780000-379188800000 ---p 00000000 00:00 0 
backend_1   | 3827d4000000-3827d4080000 ---p 00000000 00:00 0 
backend_1   | 391b98b00000-391b98b80000 ---p 00000000 00:00 0 
backend_1   | 3923a9180000-3923a9200000 rw-p 00000000 00:00 0 
backend_1   | 39599d280000-39599d300000 ---p 00000000 00:00 0 
backend_1   | 397224c80000-397224d00000 rw-p 00000000 00:00 0 
backend_1   | 39b39c280000-39b39c300000 ---p 00000000 00:00 0 
backend_1   | 3ad051800000-3ad051880000 ---p 00000000 00:00 0 
backend_1   | 3b8a13980000-3b8a13a00000 rw-p 00000000 00:00 0 
backend_1   | 3bf9fb780000-3bf9fb800000 ---p 00000000 00:00 0 
backend_1   | 3c5448d00000-3c5448d80000 ---p 00000000 00:00 0 
backend_1   | 3cf9d9f80000-3cf9da000000 rw-p 00000000 00:00 0 
backend_1   | 3e6815780000-3e6815800000 rw-p 00000000 00:00 0 
backend_1   | 3f096a380000-3f096a400000 rw-p 00000000 00:00 0 
backend_1   | 3fb07d700000-3fb07d780000 ---p 00000000 00:00 0 
backend_1   | 7fecf0000000-7fecf00af000 rw-p 00000000 00:00 0 
backend_1   | 7fecf00af000-7fecf4000000 ---p 00000000 00:00 0 
backend_1   | 7fecf8000000-7fecf8021000 rw-p 00000000 00:00 0 
backend_1   | 7fecf8021000-7fecfc000000 ---p 00000000 00:00 0 
backend_1   | 7fecfc000000-7fecfc0a1000 rw-p 00000000 00:00 0 
backend_1   | 7fecfc0a1000-7fed00000000 ---p 00000000 00:00 0 
backend_1   | 7fed00000000-7fed001b3000 rw-p 00000000 00:00 0 
backend_1   | 7fed001b3000-7fed04000000 ---p 00000000 00:00 0 
backend_1   | 7fed0457f000-7fed0be8f000 r-xp 00000000 08:01 1837854                    /opt/oracle/instantclient_18_3/libociei.so
backend_1   | 7fed0be8f000-7fed0c08e000 ---p 07910000 08:01 1837854                    /opt/oracle/instantclient_18_3/libociei.so
backend_1   | 7fed0c08e000-7fed0c08f000 rw-p 0790f000 08:01 1837854                    /opt/oracle/instantclient_18_3/libociei.so
backend_1   | 7fed0c08f000-7fed0fb77000 r-xp 00000000 08:01 1837847                    /opt/oracle/instantclient_18_3/libclntsh.so.18.1
backend_1   | 7fed0fb77000-7fed0fd77000 ---p 03ae8000 08:01 1837847                    /opt/oracle/instantclient_18_3/libclntsh.so.18.1
backend_1   | 7fed0fd77000-7fed0ffdf000 rw-p 03ae8000 08:01 1837847                    /opt/oracle/instantclient_18_3/libclntsh.so.18.1
backend_1   | 7fed0ffdf000-7fed10000000 rw-p 00000000 00:00 0 
backend_1   | 7fed10000000-7fed100a2000 rw-p 00000000 00:00 0 
backend_1   | 7fed100a2000-7fed14000000 ---p 00000000 00:00 0 
backend_1   | 7fed1555e000-7fed15569000 r-xp 00000000 08:01 1213229                    /lib/x86_64-linux-gnu/libnss_nis-2.24.so
backend_1   | 7fed15569000-7fed15768000 ---p 0000b000 08:01 1213229                    /lib/x86_64-linux-gnu/libnss_nis-2.24.so
backend_1   | 7fed15768000-7fed15769000 r--p 0000a000 08:01 1213229                    /lib/x86_64-linux-gnu/libnss_nis-2.24.so
backend_1   | 7fed15769000-7fed1576a000 rw-p 0000b000 08:01 1213229                    /lib/x86_64-linux-gnu/libnss_nis-2.24.so
backend_1   | 7fed1576a000-7fed15771000 r-xp 00000000 08:01 1213221                    /lib/x86_64-linux-gnu/libnss_compat-2.24.so
backend_1   | 7fed15771000-7fed15970000 ---p 00007000 08:01 1213221                    /lib/x86_64-linux-gnu/libnss_compat-2.24.so
backend_1   | 7fed15970000-7fed15971000 r--p 00006000 08:01 1213221                    /lib/x86_64-linux-gnu/libnss_compat-2.24.so
backend_1   | 7fed15971000-7fed15972000 rw-p 00007000 08:01 1213221                    /lib/x86_64-linux-gnu/libnss_compat-2.24.so
backend_1   | 7fed15972000-7fed15d2f000 r-xp 00000000 08:01 1837848                    /opt/oracle/instantclient_18_3/libclntshcore.so.18.1
backend_1   | 7fed15d2f000-7fed15f2e000 ---p 003bd000 08:01 1837848                    /opt/oracle/instantclient_18_3/libclntshcore.so.18.1
backend_1   | 7fed15f2e000-7fed15f44000 rw-p 003bc000 08:01 1837848                    /opt/oracle/instantclient_18_3/libclntshcore.so.18.1
backend_1   | 7fed15f44000-7fed15f54000 rw-p 00000000 00:00 0 
backend_1   | 7fed15f54000-7fed15f55000 r-xp 00000000 08:01 1838079                    /lib/x86_64-linux-gnu/libaio.so.1.0.1
backend_1   | 7fed15f55000-7fed16154000 ---p 00001000 08:01 1838079                    /lib/x86_64-linux-gnu/libaio.so.1.0.1
backend_1   | 7fed16154000-7fed16155000 r--p 00000000 08:01 1838079                    /lib/x86_64-linux-gnu/libaio.so.1.0.1
backend_1   | 7fed16155000-7fed16156000 rw-p 00000000 00:00 0 
backend_1   | 7fed16156000-7fed1616a000 r-xp 00000000 08:01 1213219                    /lib/x86_64-linux-gnu/libnsl-2.24.so
backend_1   | 7fed1616a000-7fed1636a000 ---p 00014000 08:01 1213219                    /lib/x86_64-linux-gnu/libnsl-2.24.so
backend_1   | 7fed1636a000-7fed1636b000 r--p 00014000 08:01 1213219                    /lib/x86_64-linux-gnu/libnsl-2.24.so
backend_1   | 7fed1636b000-7fed1636c000 rw-p 00015000 08:01 1213219                    /lib/x86_64-linux-gnu/libnsl-2.24.so
backend_1   | 7fed1636c000-7fed1636e000 rw-p 00000000 00:00 0 
backend_1   | 7fed1636e000-7fed163bd000 r-xp 00000000 08:01 1837857                    /opt/oracle/instantclient_18_3/libons.so
backend_1   | 7fed163bd000-7fed165bc000 ---p 0004f000 08:01 1837857                    /opt/oracle/instantclient_18_3/libons.so
backend_1   | 7fed165bc000-7fed165be000 rw-p 0004e000 08:01 1837857                    /opt/oracle/instantclient_18_3/libons.so
backend_1   | 7fed165be000-7fed16aa1000 r-xp 00000000 08:01 1837851                    /opt/oracle/instantclient_18_3/libnnz18.so
backend_1   | 7fed16aa1000-7fed16ca0000 ---p 004e3000 08:01 1837851                    /opt/oracle/instantclient_18_3/libnnz18.so
backend_1   | 7fed16ca0000-7fed16d11000 rw-p 004e2000 08:01 1837851                    /opt/oracle/instantclient_18_3/libnnz18.so
backend_1   | 7fed16d11000-7fed16d13000 rw-p 00000000 00:00 0 
backend_1   | 7fed16d13000-7fed16f7a000 r-xp 00000000 08:01 1837849                    /opt/oracle/instantclient_18_3/libipc1.so
backend_1   | 7fed16f7a000-7fed1717a000 ---p 00267000 08:01 1837849                    /opt/oracle/instantclient_18_3/libipc1.so
backend_1   | 7fed1717a000-7fed1717f000 rw-p 00267000 08:01 1837849                    /opt/oracle/instantclient_18_3/libipc1.so
backend_1   | 7fed1717f000-7fed17182000 rw-p 00000000 00:00 0 
backend_1   | 7fed17182000-7fed171de000 r-xp 00000000 08:01 1837850                    /opt/oracle/instantclient_18_3/libmql1.so
backend_1   | 7fed171de000-7fed173dd000 ---p 0005c000 08:01 1837850                    /opt/oracle/instantclient_18_3/libmql1.so
backend_1   | 7fed173dd000-7fed173e0000 rw-p 0005b000 08:01 1837850                    /opt/oracle/instantclient_18_3/libmql1.so
backend_1   | 7fed173e0000-7fed173e2000 rw-p 00000000 00:00 0 
backend_1   | 7fed173e2000-7fed173f6000 r-xp 00000000 08:01 1213244                    /lib/x86_64-linux-gnu/libresolv-2.24.so
backend_1   | 7fed173f6000-7fed175f5000 ---p 00014000 08:01 1213244                    /lib/x86_64-linux-gnu/libresolv-2.24.so
backend_1   | 7fed175f5000-7fed175f6000 r--p 00013000 08:01 1213244                    /lib/x86_64-linux-gnu/libresolv-2.24.so
backend_1   | 7fed175f6000-7fed175f7000 rw-p 00014000 08:01 1213244                    /lib/x86_64-linux-gnu/libresolv-2.24.so
backend_1   | 7fed175f7000-7fed175f9000 rw-p 00000000 00:00 0 
backend_1   | 7fed175f9000-7fed175fe000 r-xp 00000000 08:01 1213223                    /lib/x86_64-linux-gnu/libnss_dns-2.24.so
backend_1   | 7fed175fe000-7fed177fd000 ---p 00005000 08:01 1213223                    /lib/x86_64-linux-gnu/libnss_dns-2.24.so
backend_1   | 7fed177fd000-7fed177fe000 r--p 00004000 08:01 1213223                    /lib/x86_64-linux-gnu/libnss_dns-2.24.so
backend_1   | 7fed177fe000-7fed177ff000 rw-p 00005000 08:01 1213223                    /lib/x86_64-linux-gnu/libnss_dns-2.24.so
backend_1   | 7fed177ff000-7fed17800000 ---p 00000000 00:00 0 
backend_1   | 7fed17800000-7fed18000000 rw-p 00000000 00:00 0 
backend_1   | 7fed18000000-7fed1826f000 rw-p 00000000 00:00 0 
backend_1   | 7fed1826f000-7fed1c000000 ---p 00000000 00:00 0 
backend_1   | 7fed1c000000-7fed1c2a6000 rw-p 00000000 00:00 0 
backend_1   | 7fed1c2a6000-7fed20000000 ---p 00000000 00:00 0 
backend_1   | 7fed20000000-7fed2014f000 rw-p 00000000 00:00 0 
backend_1   | 7fed2014f000-7fed24000000 ---p 00000000 00:00 0 
backend_1   | 7fed24000000-7fed2426f000 rw-p 00000000 00:00 0 
backend_1   | 7fed2426f000-7fed28000000 ---p 00000000 00:00 0 
backend_1   | 7fed28000000-7fed28021000 rw-p 00000000 00:00 0 
backend_1   | 7fed28021000-7fed2c000000 ---p 00000000 00:00 0 
backend_1   | 7fed2c1bb000-7fed2c1c5000 r-xp 00000000 08:01 1213225                    /lib/x86_64-linux-gnu/libnss_files-2.24.so
backend_1   | 7fed2c1c5000-7fed2c3c5000 ---p 0000a000 08:01 1213225                    /lib/x86_64-linux-gnu/libnss_files-2.24.so
backend_1   | 7fed2c3c5000-7fed2c3c6000 r--p 0000a000 08:01 1213225                    /lib/x86_64-linux-gnu/libnss_files-2.24.so
backend_1   | 7fed2c3c6000-7fed2c3c7000 rw-p 0000b000 08:01 1213225                    /lib/x86_64-linux-gnu/libnss_files-2.24.so
backend_1   | 7fed2c3c7000-7fed2c3cd000 rw-p 00000000 00:00 0 
backend_1   | 7fed2c3cd000-7fed2c3ce000 ---p 00000000 00:00 0 
backend_1   | 7fed2c3ce000-7fed2cbce000 rw-p 00000000 00:00 0 
backend_1   | 7fed2cbce000-7fed2cbcf000 ---p 00000000 00:00 0 
backend_1   | 7fed2cbcf000-7fed2d3cf000 rw-p 00000000 00:00 0 
backend_1   | 7fed2d3cf000-7fed2d3d0000 ---p 00000000 00:00 0 
backend_1   | 7fed2d3d0000-7fed2dbd0000 rw-p 00000000 00:00 0 
backend_1   | 7fed2dbd0000-7fed2dc59000 r-xp 00000000 08:01 3026847                    /app/node_modules/oracledb/build/Release/oracledb-abi67-linux-x64.node
backend_1   | 7fed2dc59000-7fed2de59000 ---p 00089000 08:01 3026847                    /app/node_modules/oracledb/build/Release/oracledb-abi67-linux-x64.node
backend_1   | 7fed2de59000-7fed2de5c000 rw-p 00089000 08:01 3026847                    /app/node_modules/oracledb/build/Release/oracledb-abi67-linux-x64.node
backend_1   | 7fed2de5c000-7fed2de5d000 rw-p 00000000 00:00 0 
backend_1   | 7fed2de5d000-7fed2de5e000 ---p 00000000 00:00 0 
backend_1   | 7fed2de5e000-7fed2e65e000 rw-p 00000000 00:00 0 
backend_1   | 7fed2e65e000-7fed2e65f000 ---p 00000000 00:00 0 
backend_1   | 7fed2e65f000-7fed2ee5f000 rw-p 00000000 00:00 0 
backend_1   | 7fed2ee5f000-7fed2ee60000 ---p 00000000 00:00 0 
backend_1   | 7fed2ee60000-7fed2f660000 rw-p 00000000 00:00 0 
backend_1   | 7fed2f660000-7fed2f661000 ---p 00000000 00:00 0 
backend_1   | 7fed2f661000-7fed2fe61000 rw-p 00000000 00:00 0 
backend_1   | 7fed2fe61000-7fed2fe62000 ---p 00000000 00:00 0 
backend_1   | 7fed2fe62000-7fed30662000 rw-p 00000000 00:00 0 
backend_1   | 7fed30662000-7fed307f7000 r-xp 00000000 08:01 1213179                    /lib/x86_64-linux-gnu/libc-2.24.so
backend_1   | 7fed307f7000-7fed309f7000 ---p 00195000 08:01 1213179                    /lib/x86_64-linux-gnu/libc-2.24.so
backend_1   | 7fed309f7000-7fed309fb000 r--p 00195000 08:01 1213179                    /lib/x86_64-linux-gnu/libc-2.24.so
backend_1   | 7fed309fb000-7fed309fd000 rw-p 00199000 08:01 1213179                    /lib/x86_64-linux-gnu/libc-2.24.so
backend_1   | 7fed309fd000-7fed30a01000 rw-p 00000000 00:00 0 
backend_1   | 7fed30a01000-7fed30a19000 r-xp 00000000 08:01 1213242                    /lib/x86_64-linux-gnu/libpthread-2.24.so
backend_1   | 7fed30a19000-7fed30c18000 ---p 00018000 08:01 1213242                    /lib/x86_64-linux-gnu/libpthread-2.24.so
backend_1   | 7fed30c18000-7fed30c19000 r--p 00017000 08:01 1213242                    /lib/x86_64-linux-gnu/libpthread-2.24.so
backend_1   | 7fed30c19000-7fed30c1a000 rw-p 00018000 08:01 1213242                    /lib/x86_64-linux-gnu/libpthread-2.24.so
backend_1   | 7fed30c1a000-7fed30c1e000 rw-p 00000000 00:00 0 
backend_1   | 7fed30c1e000-7fed30c34000 r-xp 00000000 08:01 1213199                    /lib/x86_64-linux-gnu/libgcc_s.so.1
backend_1   | 7fed30c34000-7fed30e33000 ---p 00016000 08:01 1213199                    /lib/x86_64-linux-gnu/libgcc_s.so.1
backend_1   | 7fed30e33000-7fed30e34000 r--p 00015000 08:01 1213199                    /lib/x86_64-linux-gnu/libgcc_s.so.1
backend_1   | 7fed30e34000-7fed30e35000 rw-p 00016000 08:01 1213199                    /lib/x86_64-linux-gnu/libgcc_s.so.1
backend_1   | 7fed30e35000-7fed30f38000 r-xp 00000000 08:01 1213208                    /lib/x86_64-linux-gnu/libm-2.24.so
backend_1   | 7fed30f38000-7fed31137000 ---p 00103000 08:01 1213208                    /lib/x86_64-linux-gnu/libm-2.24.so
backend_1   | 7fed31137000-7fed31138000 r--p 00102000 08:01 1213208                    /lib/x86_64-linux-gnu/libm-2.24.so
backend_1   | 7fed31138000-7fed31139000 rw-p 00103000 08:01 1213208                    /lib/x86_64-linux-gnu/libm-2.24.so
backend_1   | 7fed31139000-7fed312ab000 r-xp 00000000 08:01 1213965                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
backend_1   | 7fed312ab000-7fed314ab000 ---p 00172000 08:01 1213965                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
backend_1   | 7fed314ab000-7fed314b5000 r--p 00172000 08:01 1213965                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
backend_1   | 7fed314b5000-7fed314b7000 rw-p 0017c000 08:01 1213965                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
backend_1   | 7fed314b7000-7fed314bb000 rw-p 00000000 00:00 0 
backend_1   | 7fed314bb000-7fed314c2000 r-xp 00000000 08:01 1213246                    /lib/x86_64-linux-gnu/librt-2.24.so
backend_1   | 7fed314c2000-7fed316c1000 ---p 00007000 08:01 1213246                    /lib/x86_64-linux-gnu/librt-2.24.so
backend_1   | 7fed316c1000-7fed316c2000 r--p 00006000 08:01 1213246                    /lib/x86_64-linux-gnu/librt-2.24.so
backend_1   | 7fed316c2000-7fed316c3000 rw-p 00007000 08:01 1213246                    /lib/x86_64-linux-gnu/librt-2.24.so
backend_1   | 7fed316c3000-7fed316c6000 r-xp 00000000 08:01 1213191                    /lib/x86_64-linux-gnu/libdl-2.24.so
backend_1   | 7fed316c6000-7fed318c5000 ---p 00003000 08:01 1213191                    /lib/x86_64-linux-gnu/libdl-2.24.so
backend_1   | 7fed318c5000-7fed318c6000 r--p 00002000 08:01 1213191                    /lib/x86_64-linux-gnu/libdl-2.24.so
backend_1   | 7fed318c6000-7fed318c7000 rw-p 00003000 08:01 1213191                    /lib/x86_64-linux-gnu/libdl-2.24.so
backend_1   | 7fed318c7000-7fed318ea000 r-xp 00000000 08:01 1213161                    /lib/x86_64-linux-gnu/ld-2.24.so
backend_1   | 7fed31ab8000-7fed31ab9000 ---p 00000000 00:00 0 
backend_1   | 7fed31ab9000-7fed31ade000 rw-p 00000000 00:00 0 
backend_1   | 7fed31ae2000-7fed31ae3000 rw-p 00000000 00:00 0 
backend_1   | 7fed31ae3000-7fed31ae4000 ---p 00000000 00:00 0 
backend_1   | 7fed31ae4000-7fed31aea000 rw-p 00000000 00:00 0 
backend_1   | 7fed31aea000-7fed31aeb000 r--p 00023000 08:01 1213161                    /lib/x86_64-linux-gnu/ld-2.24.so
backend_1   | 7fed31aeb000-7fed31aec000 rw-p 00024000 08:01 1213161                    /lib/x86_64-linux-gnu/ld-2.24.so
backend_1   | 7fed31aec000-7fed31aed000 rw-p 00000000 00:00 0 
backend_1   | 7ffd7d24e000-7ffd7d270000 rw-p 00000000 00:00 0                          [stack]
backend_1   | 7ffd7d2bf000-7ffd7d2c1000 r--p 00000000 00:00 0                          [vvar]
backend_1   | 7ffd7d2c1000-7ffd7d2c3000 r-xp 00000000 00:00 0                          [vdso]
backend_1   | ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

It looks like an issue in the client, but I don't know how to avoid it. It also seems to block further execution of my periodic pings.

dmcghan commented 5 years ago

It would be helpful if you could show us the code.

cjbj commented 5 years ago

@hwesley what are you trying to protect against and trying to do? Why did you determine you need to close the pool instead of just letting the pool establish new connections? Do you have a standby database? Do you have RAC? Even if not, can you create a FAN service (for the DB / Oracle Net) and enable events in node-oracledb? What sqlnet.ora parameters do you have in your Node servers - see Connections and High Availability ? Do you have SQLNET.EXPIRE_TIME in the DB sqlnet.ora?

Are you terminating the pool with a specified drainTime?

If you're using the await/async style of programming, triple-check you have used 'await' appropriately - not doing this can commonly lead to crashes.

hwesley commented 5 years ago

The code can be started by calling:

process.env.UV_THREADPOOL_SIZE = 64;

await dbConnection.createPool(POOL);
await dbConnection.createPool(POOL_EDIT);

Here are the necessary files: DBConnector.js

import oracledb from 'oracledb';

import { Pinger } from '../../utils/Pinger';
import { timeoutPromise, sleep } from '../../utils/PromiseUtils';

oracledb.fetchAsString = [oracledb.CLOB];

const POOL = 'pool';
const POOL_EDIT = 'poolEdit';

const DEFAULT_TIMEOUT = 15000;

const POOL_CONFIG = {
  connectionString: `${process.env.DB_HOST}:${process.env.DB_PORT}/cdb1`,
  password: process.env.DB_PASSWORD,
  user: process.env.DB_USER,
  _enableStats: true,
  poolMin: 0,
  poolIncrement: 1,
  poolPingInterval: 60,
  poolTimeout: 30,
};

class DBConnector {
  constructor() {
    this.createPool = this.createPool.bind(this);
    this.closePool = this.closePool.bind(this);
    this.recreatePools = this.recreatePools.bind(this);
    this.recreatePool = this.recreatePool.bind(this);
    this.verifyPoolSanity = this.verifyPoolSanity.bind(this);
    this.checkStatus = this.checkStatus.bind(this);

    this.ping = this.ping.bind(this);

    this.executeQuery = this.executeQuery.bind(this);

    this.closeConnection = this.closeConnection.bind(this);

    this.logError = this.logError.bind(this);

    const pinger = new Pinger('Oracle DB', this.ping, '*/2 * * * *');
    pinger.onPingFailed = async () => {
      await this.recreatePools();
    };
  }

  /**
   * Initializes a connection pool to an Oracle db
   * @param {String} poolAlias Name of the pool to init
   */
  async createPool(poolAlias) {
    return new Promise(async (resolve, reject) => {
      let config;
      switch (poolAlias) {
        case POOL:
          config = { ...POOL_CONFIG, poolAlias, poolMax: 60 };
          break;
        case POOL_EDIT:
          config = { ...POOL_CONFIG, poolAlias, poolMax: 4 };
          break;
      }

      if (!config) {
        throw new Error(`No config available for poolAlias: ${poolAlias}`);
      }

      await oracledb.createPool(config, (error, pool) => {
        if (error) {
          console.log(`Create pool error: ${poolAlias}`, error);
          reject();
        }

        console.log(`Finished creating pool: ${poolAlias}`);
        resolve();
      });
    });
  }

  /**
   * Closes the connection pool
   * @param {String} poolAlias The pool to close
   */
  async closePool(poolAlias) {
    return new Promise(async (resolve, reject) => {
      console.log(`Destroy pool: ${poolAlias}`);
      const pool = await oracledb.getPool(poolAlias);
      //await pool.close(error => {
      await pool.terminate(error => {
        if (error) {
          console.log(`Error in close pool: ${poolAlias}`, error);
          reject(error);
        }

        console.log(`Finished destroying pool: ${poolAlias}`);
        resolve();
      });
    });
  }

  /**
   * Closes the pools and creates them again
   */
  async recreatePools() {
    await this.recreatePool(POOL);
    await this.recreatePool(POOL_EDIT);
  }

  async recreatePool(poolAlias) {
    // recreate pool
    console.log(`Recreating pool: ${poolAlias}`);

    // destroy pool
    try {
      await this.closePool(poolAlias);
    } catch (err) {
      console.log(`Error destroying pool: ${poolAlias}`, err);
    }

    try {
      await this.createPool(poolAlias);
    } catch (err) {
      console.log(`error creating pool: ${poolAlias}`, err);
    }

    console.log(`Finished recreating pool: ${poolAlias}`);
  }

  async verifyPoolSanity() {
    const poolStatus = await this.checkStatus();

    if (poolStatus.selectPool.connectionsInUse >= 60) {
      const statusString = `Select pool is full: connectionsInUse: ${
        poolStatus.selectPool.connectionsInUse
      }; connectionsOpen: ${poolStatus.selectPool.connectionsOpen}`;

      logException(new Error(statusString));
      console.log(statusString);
    }

    if (poolStatus.editPool.connectionsInUse >= 60) {
      const statusString = `Select pool is full: connectionsInUse: ${
        poolStatus.editPool.connectionsInUse
      }; connectionsOpen: ${poolStatus.editPool.connectionsOpen}`;

      logException(new Error(statusString));
      console.log(statusString);
    }
  }

  async ping() {
    const response = await this.executeQuery(`SELECT 1 FROM DUAL`);
    if (!response.result || response.result.length == 0) {
      throw new Error('Pinging failed');
    }

    await this.verifyPoolSanity();
  }

  /**
   * Checks status of database
   */
  async checkStatus() {
    const result = {};

    try {
      const pool = await getPoolStatus(POOL);
      result.selectPool = pool;
    } catch (err) {
      result.selectPool = {
        isConnected: false,
      };
    }

    try {
      const pool = await getPoolStatus(POOL_EDIT);
      result.editPool = pool;
    } catch (err) {
      result.editPool = {
        isConnected: false,
      };
    }

    return result;
  }

  async getPoolStatus(poolAlias) {
    // select pool
    const connection = await timeoutPromise(DEFAULT_TIMEOUT, oracledb.getConnection({ poolAlias }));
    const connectionStatus = connection ? true : false;
    if (connection) {
      this.closeConnection(connection);
    }

    const pool = oracledb.getPool(poolAlias);

    return {
      isConnected: connectionStatus,
      connectionsInUse: pool ? pool.connectionsInUse : 0,
      connectionsOpen: pool ? pool.connectionsOpen : 0,
    };
  }

  /**
   * Executes a query on a connection from the db pool
   * @param {String} query The query to execute
   * @param {Array} vars An array of vars to fill the query
   */
  async executeQuery(query, vars) {
    let connection;
    try {
      connection = await timeoutPromise(DEFAULT_TIMEOUT, oracledb.getConnection({ POOL }));
      if (connection) {
        const result = await timeoutPromise(
          DEFAULT_TIMEOUT,
          connection.execute(query, vars || [], {
            outFormat: oracledb.OBJECT,
            maxRows: 1000,
          })
        );
        if (result && result.rows)
          return {
            success: true,
            result: result.rows || result,
          };
        return {
          success: true,
          result: null,
        };
      } else {
        console.log('No valid connection, retry method');
        return this.executeQuery(query, vars);
      }
    } catch (error) {
      this.logError(error, poolAlias);
      return {
        success: false,
        error: error,
      };
    } finally {
      await this.closeConnection(connection);
    }
  }

  /**
   * Closes the given db connection an releases it to the connection pool
   * @param {Connection} connection The Db connection
   */
  async closeConnection(connection) {
    try {
      if (connection) {
        await connection.close();
      }
    } catch (err) {
      logException(err);

      console.log('Error closing connection', err);
    }
  }

  /**
   * Logs an error and important metadata about the pool
   * @param {Object} error Error caught by system
   * @param {String} poolAlias The pool throwing the error
   */
  logError(error, poolAlias) {
    console.log(`Error connecting to db with ${poolAlias} pool:`, error);
  }
}

exports.dbConnection = new DBConnector();
exports.POOL = POOL;
exports.POOL_EDIT = POOL_EDIT;

Pinger.js

import { CronJob } from 'cron';

export class Pinger {
  constructor(name, pingFunction, cronExpression = '*/2 * * * *') {
    this.name = name;
    this.pingFunction = pingFunction;

    // Callbacks
    this.onPingFailed = null;

    // Public methods
    this.ping = this.ping.bind(this);

    const pingCronJob = new CronJob(cronExpression, this.ping, () => {});
    pingCronJob.start();

    console.log(`Pinger created: ${name}`);
  }

  async ping() {
    console.log(`Pinging ${this.name}`);
    try {
      await this.pingFunction();

      console.log(`Pinging ${this.name} succeeded`);
    } catch (err) {
      console.log(`Pinging ${this.name} failed: ${err}`);
      try {
        if (this.onPingFailed) {
          await this.onPingFailed(err);
        }
      } catch (e) {
        console.log('error in onpingfailed', e);
      }

      console.log('Finished pinging');
    }
  }
}

PromiseUtils.js

export const timeoutPromise = (ms, promise) => {
  // Create a promise that rejects in <ms> milliseconds
  let timeout = new Promise((resolve, reject) => {
    setTimeout(() => {
      reject('Timed out in ' + ms + 'ms.');
    }, ms);
  });

  // Returns a race between our timeout and the passed in promise
  return Promise.race([promise, timeout]);
};
hwesley commented 5 years ago

@hwesley what are you trying to protect against and trying to do? Why did you determine you need to close the pool instead of just letting the pool establish new connections? Do you have a standby database? Do you have RAC? Even if not, can you create a FAN service (for the DB / Oracle Net) and enable events in node-oracledb? What sqlnet.ora parameters do you have in your Node servers - see Connections and High Availability ? Do you have SQLNET.EXPIRE_TIME in the DB sqlnet.ora?

Are you terminating the pool with a specified drainTime?

If you're using the await/async style of programming, triple-check you have used 'await' appropriately - not doing this can commonly lead to crashes.

Hi @cjbj , We are running a backend on Azure, which connects to an Oracle DB which we do not necessarily control, so I can't change settings on that environment.

We had an issue were Azure drops idle connections (after 4 minutes) because of their NAS infrastructure. We noticed that when this happens, the pool doesn't notice that the connection is no longer valid & it hangs on pool.getConnection(). To avoid this, we started sending pings to keep the connection alive, which solved our issue when the database is available (the poolPingInterval didn't seem to do anything).

Now we want to fix another issue. It can happen dat the Oracle DB goes down & becomes unavailable. When this happens, the pool becomes idle again, and we thus notice the same issue that the pool doesn't notice this. When we startup the Oracle DB again after 4 minutes, the ping keeps failing while the Oracle DB is available. As a last resort we thought we should destroy the pool & create it again to fix this state (see onPingFailed callback).

I've double checked the await logic. When I don't call the recreatePools() method, and throw an exception in onPingFailed callback or ping function, the exception gets handled like it should & the pinging (every 2 minutes) keeps happening.

I've temporarily setup an Oracle DB on Azure following this tutorial to debug this issue. It's currently an empty database because the only thing I need to get working is SELECT 1 FROM DUAL & have durable connections.

To reproduce the memory corruption bug, the Oracle DB doesn't even need to be available. It doesn't always show the memory corruption stacktrace. Sometimes it just keeps hanging after the creating the pool and the ping doesn't get triggered anymore.

I'm currently not using drainTime. I tried it with 0 but the result was the same. Because the Oracle DB is not reachable, there are no active connections & this parameter should not matter.

cjbj commented 5 years ago

@hwesley details matter! Error numbers and the like will help.

I would like to wind back to the "the pool doesn't notice that the connection is no longer valid & it hangs on pool.getConnection()." and dig more. Do you have a pstack? Can you enable Oracle client & net tracing?

What sqlnet.ora configuration do you have?

I'd be happy to take this offline (send me email) if that helps.

cjbj commented 5 years ago

@hwesley My favorite setting for resolving net un-responsiveness is to add DISABLE_OOB=ON to the sqlnet.ora (on the Node.js computer). You may also want to look at tuning TCP/IP timeouts.

cjbj commented 5 years ago

Breaking news (at least to me) is that EXPIRE_TIME is usable in "client" (aka Node.js computer) sqlnet.ora files as of at least 18c client for the C library stack, as used by node-oracledb. This is for 'dead server' detection. When used in a DB server sqlnet.ora it performs 'dead client' detection. I will get the Oracle Net doc updated.

Anyway, along with this, and other options mentioned in the previous node-oracledb doc link, you have some options to experiment with. They may or may not help your current situation but should give you a better high availability experience overall.

hwesley commented 5 years ago

Hi @cjbj, Thanks for your input! I currently don't have a sqlnet.ora configuration file, so it looks like there are some settings I didn't try yet, so i'll experiment and try to avoid recreating the pool!

I experimented with the TCP/IP keep alive flags a few months ago:

 net.ipv4.tcp_keepalive_intvl: '20'
 net.ipv4.tcp_keepalive_probes: '5'
 net.ipv4.tcp_keepalive_time: '60'

But I remember it didn't work, they were not being sent. I suppose I need to have a sqlnet.ora file with EXPIRE_TIME to have this working? If I understand correctly the EXPIRE_TIME will override the tcp keep alive settings?

cjbj commented 5 years ago

I understand that EXPIRE_TIME does supersede using ENABLE=BROKEN in the connect string and fiddling the machine TCP timeouts (https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-7A18022A-E40D-4880-B3CE-7EE9864756CA)

dmcghan commented 5 years ago

@hwesley The code could use some work...

Let's start with the constructor function for the DBConnector class. What is the point of lines like this?

this.createPool = this.createPool.bind(this);

I get the impression that you're trying to expose the methods as public or set the context of this, neither of which is necessary. Here's an example:

function pause(ms) {
  return new Promise(function(resolve) {
    setTimeout(() => {
      resolve();
    }, ms);
  });
}

class DBConnector {
  constructor() {
    this.prop = 'val';
  }

  async createPool(poolAlias) {
    console.log('before');
    await pause(3000);
    // Methods have access to the instance
    console.log('after', this.prop);
  }
}

async function runTest() {
  const myConnector = new DBConnector();

  // createPool is public
  await myConnector.createPool();

  console.log('after after');
}

runTest();

Next, looking at createPool, why are you returning a promise if it's not needed?

You're also mixing async APIs. For example, here you're awaiting a call to createPool, but you pass in a callback which means it doesn't return a promise so there's nothing to be awaited.

await oracledb.createPool(config, (error, pool) => {...`

createPoolshould look more like this:

  async createPool(poolAlias) {
    let config;

    switch (poolAlias) {
      case POOL:
        config = { ...POOL_CONFIG, poolAlias, poolMax: 60 };
        break;
      case POOL_EDIT:
        config = { ...POOL_CONFIG, poolAlias, poolMax: 4 };
        break;
    }

    if (!config) {
      throw new Error(`No config available for poolAlias: ${poolAlias}`);
    }

    try {
      await oracledb.createPool(config);

      console.log(`Finished creating pool: ${poolAlias}`);
    } catch (err) {
      console.log(`Create pool error: ${poolAlias}`, error);
      throw err;
    }
  }

closePool has issues similar to createPool - unnecessary promise and mixing async APIs.

Favor pool.close over pool.terminate, as the latter is just an alias from earlier versions of the driver.

Next, add a drainTime to the pool close call.

recreatePools looks good.

recreatePool tries to close a pool and if it fails, it tries to recreate the pool anyway. That will fail because of duplicate pool aliases in the pool cache. Perhaps it's missing a return at the end of the catch?

One of the bigger potential issues I see is the use of timeoutPromise in several functions. This is a rather generic function, meaning it doesn't know about how the driver works. When you use this with getConnection, it's possible that the timeoutPromise finishes first, however, there's still a request for a connection in the queue. When that completes, the connection will be removed from the pool but not returned.

Did you know there's a queueTimeout built in? https://github.com/oracle/node-oracledb/blob/dev-4.0/doc/api.md#propdbqueuetimeout

Using timeoutPromise with execute has a similar problem. A connection can only do one thing at a time. So if the timeoutPromise happens first, you'll call closeConnection but that call will have to wait until the execute finishes. At that point, there's a connection doing work for nothing when it could be doing something more useful.

Have a look at break for a better solution: https://github.com/oracle/node-oracledb/blob/dev-4.0/doc/api.md#break

I recommend stripping out timeoutPromise.

In ping, did you consider using the built-in ping method? https://github.com/oracle/node-oracledb/blob/dev-4.0/doc/api.md#connectionping If it works for you, it should be more performant.

What happens if a call to executeQuery is issued while you're recreating a pool? You'd have to build in a queue to prevent those calls from getting errors. Just something to be aware of...

Finally, I'd recommend trying to rewrite the code in a way that objects with success properties are not needed. For example, excuteQuery should re-throw the error and its caller should have a try/catch block when using it.

hwesley commented 5 years ago

Hi Dan,

Thanks for your input! You're right, some parts of the code are not implemented very well. I've followed your remarks and updated the code accordingly.

I would also like to get rid of timeoutPromise. The reason we have this, is because on an Azure VM, connections are being closed when they are idle (idle meaning no traffic) for 4 minutes. This is a parameter that can't be changed.

When this occurs, the connection from the pool doesn't seem to know it got closed. The result is that it hangs on getConnection. We never receive a result, not even after 10 minutes. That's why we added the timeoutPromise. Ideally, this connection should stay alive. That's why we implemented a manual ping method in the first place, but this only runs on the connection given by getConnection and not on every connection in the pool, so this approach still has flaws.

I've also set the queueTimeout on 15000ms. This doesn't seem to have any effect.

The break method would not help because I can't get a 'connection' object. Also it looks like this is new in version 4.*? We are still using the latest stable version: 3.1.2.

When pool connections fail when the database is not available, and stay valid when the database is available, there would be no need to recreate the pools.

To fix this, I believe we should be able to send TCP keep alive packets over the active connections. I've understood I need to have a sqlnet.ora configuration file with settings:

DISABLE_OOB=ON
SQLNET.EXPIRE_TIME=1

So far this didn't solve the issue. I've got some more suggestions from @cjbj, which I will follow up tomorrow.

dmcghan commented 5 years ago

The result is that it hangs on getConnection.

This sounds suspect. When connections are killed, the typical issue is that getConnection returns bad connections, not that it hangs.

Hanging on getConnection should only happen if there are no connections to get (they are all checked out). Have you checked your pool stats to verify this isn't the case? Perhaps you're leaking connections?

When you think about it if getConnection hanged because of a bad connection, what good would timing out and repeating the process do?

I've also set the queueTimeout on 15000ms. This doesn't seem to have any effect.

Try adding a call to pool._logStats() before you call getConnection (you'll need to enable pool stats for this to work). Let us know what you get there when there's a hang.

The break method would not help because I can't get a 'connection' object.

I'm suggesting the use of break where you've wrapped conn.execute, not pool.getConnection.

Also it looks like this is new in version 4.*? We are still using the latest stable version: 3.1.2.

No, break has been there since version 1.

cjbj commented 5 years ago

I would also like to get rid of timeoutPromise. The reason we have this, is because on an Azure VM, connections are being closed when they are idle (idle meaning no traffic) for 4 minutes.

I still think you need to go back to a basics and start with a simple program. Then let us help you solve problems as you encounter them

dmcghan commented 5 years ago

I still think you need to go back to a basics and start with a simple program. Then let us help you solve problems as you encounter them

Yeah, this sounds like the best way forward...

hwesley commented 5 years ago

I tested again on Azure VM with _logStats(). Here are the logs:

backend_1_bcac55c836e4 | 2019-07-02T08:25:00.004608519Z Pinging Oracle DB
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.005739616Z 
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.005768416Z Pool statistics:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006595313Z ...total up time (milliseconds): 270348
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006619213Z ...total connection requests: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006623713Z ...total requests enqueued: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006626913Z ...total requests dequeued: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006629913Z ...total requests failed: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006633113Z ...total request timeouts: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006636113Z ...max queue length: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006639213Z ...sum of time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006642313Z ...min time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006645513Z ...max time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006648513Z ...avg time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006651613Z ...pool connections in use: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006654613Z ...pool connections open: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006657713Z Related pool attributes:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006660813Z ...poolAlias: digitalTwinPool
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006663813Z ...queueTimeout (milliseconds): 15000
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006666913Z ...poolMin: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006670113Z ...poolMax: 60
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006673013Z ...poolIncrement: 1
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006676113Z ...poolTimeout (seconds): 30
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006679113Z ...poolPingInterval: 60
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006682213Z ...sessionCallback: undefined
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006685213Z ...stmtCacheSize: 30
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006689713Z Pool status:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006693913Z ...status: 6000
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006697013Z Related environment variables:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006700113Z ...process.env.UV_THREADPOOL_SIZE: 64
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.048230778Z Connection received, executing query...
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.053066463Z Closed connection
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.053199162Z Pinging Oracle DB succeeded

backend_1_bcac55c836e4 | 2019-07-02T08:30:00.001435650Z Pinging Oracle DB
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.001534350Z 
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.001541749Z Pool statistics:
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003132545Z ...total up time (milliseconds): 570344
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003148044Z ...total connection requests: 1
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003152144Z ...total requests enqueued: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003155244Z ...total requests dequeued: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003158244Z ...total requests failed: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003161444Z ...total request timeouts: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003164444Z ...max queue length: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003169544Z ...sum of time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003173844Z ...min time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003176944Z ...max time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003180044Z ...avg time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003183044Z ...pool connections in use: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003186144Z ...pool connections open: 1
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003189244Z Related pool attributes:
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003192744Z ...poolAlias: digitalTwinPool
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003195844Z ...queueTimeout (milliseconds): 15000
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003198944Z ...poolMin: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003202044Z ...poolMax: 60
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003205044Z ...poolIncrement: 1
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003208044Z ...poolTimeout (seconds): 30
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003211144Z ...poolPingInterval: 60
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003214244Z ...sessionCallback: undefined
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003217244Z ...stmtCacheSize: 30
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003220244Z Pool status:
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003223344Z ...status: 6000
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003226344Z Related environment variables:
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003229444Z ...process.env.UV_THREADPOOL_SIZE: 64

I've also noticed it doesn't hang forever. After 15 minutes the getConnection() seems to return a connection and the query succeeds. Obviously this 15 minutes is far too long.

I'm trying to get the webappawait.js example running. I've only had a few hours today so I didn't get too far. I'm mostly struggling with getting my empty OracleDB populated with the HR data. Will continue on this tomorrow. Is there any sample DB available so I don't need to set it up myself?

dmcghan commented 5 years ago

It looks like the output from those _logStats calls is from pinging the DB. Did you experience a wait? The goal is to capture the stats when you have a wait.

I've also noticed it doesn't hang forever. After 15 minutes the getConnection() seems to return a connection and the query succeeds. Obviously this 15 minutes is far too long.

Are you sure the time is spent in getConnection and not execute?

cjbj commented 5 years ago

[edited: ignore this deleted bit!]

Oracle's sample schema files are at https://github.com/oracle/db-sample-schemas

hwesley commented 5 years ago

It looks like the output from those _logStats calls is from pinging the DB. Did you experience a wait? The goal is to capture the stats when you have a wait.

I've also noticed it doesn't hang forever. After 15 minutes the getConnection() seems to return a connection and the query succeeds. Obviously this 15 minutes is far too long.

Are you sure the time is spent in getConnection and not execute?

Yes, it's a manual ping every 5 minutes (SELECT 1 FROM DUAL). The first time it succeeds (when connection gets created), the second time it hangs on getConnection for 15 minutes. The log Connection received, executing query... is printed when the getConnection method returns a connection

hwesley commented 5 years ago

Alright, I tested with the webappawait.jsexample. It's running on an Azure VM and connecting to an Oracle DB I've setup, also on an Azure VM.

The issue is exactly the same. The first time I open it in browser, the query succeeds. When I wait 5 minutes, the browser seems to timeout, i'm not sure if the getConnection method actually ever times out.

I've tried changing both poolPingInterval and poolTimeout to 5 seconds, but this doesn't change anything. I've also noticed that when I stop the application after this happens, it hangs on closing the pool. After 8-9 minutes it printed this:

Pool closed
Segmentation fault (core dumped)
cjbj commented 5 years ago

Start by adding some debugging statements to webappawait.js to trace what is going on at the app level.

What is your node-oracledb side sqlnet.ora like? And where is it - e.g. how do you know it is being used?

How do you stop it?

hwesley commented 5 years ago

Start by adding some debugging statements to webappawait.js to trace what is going on at the app level.

I see the same problems like in my own project. I've added some logs of webappawait.js below

user@localhost:~/oracle-test$ node webappawait.js 
Wed Jul 03 2019 09:19:37 GMT+0000 (UTC): Creating pool
Wed Jul 03 2019 09:19:37 GMT+0000 (UTC): Pool created
Server running at http://localhost:7000
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): Request received, getting connection
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): connection received, executing query
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): query executed, showing results
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): Closing connection
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): Connection closed
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): Request received, getting connection
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): connection received, executing query
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): query executed, showing results
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): Closing connection
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): Connection closed
Wed Jul 03 2019 09:26:10 GMT+0000 (UTC): Request received, getting connection
---- AFTER 15 minutes I decided to stop the application with CTRL + C
^C
Terminating
Pool closed
Segmentation fault (core dumped)
user@localhost:~/oracle-test$ 

What is your node-oracledb side sqlnet.ora like? And where is it?

I tested both with & without this file. The client is located at /opt/oracle/instantclient_18_3. The sqlnet.ora file is in /opt/oracle/instantclient_18_3/network/admin. It currently contains:

DISABLE_OOB=ON
SQLNET.EXPIRE_TIME=1

I also have the environment variable: LD_LIBRARY_PATH=/opt/oracle/instantclient_18_3/:

how do you know it is being used?

I have no idea, I know the client is being used because when I put it in a wrong place, it gives errors. Can you tell me how to detect this?

anthony-tuininga commented 5 years ago

Can you also set the environment variable DPI_DEBUG_LEVEL to the value 12 and rerun your tests? That will tell us the underlying ODPI-C calls that are being made and any errors that are reported. Thanks!

hwesley commented 5 years ago

Here are the logs with DPI_DEBUG_LEVEL=12. You can see it hangs at 07:12:56.673. Also the pool closing when I stop the application takes a very long time

user@dig-ocb-prod:~/oracle-test$ node webappawait.js 
ODPI [02958] 2019-07-04 07:06:55.156: ODPI-C 3.1.2
ODPI [02958] 2019-07-04 07:06:55.157: debugging messages initialized at level 12
Thu Jul 04 2019 07:06:55 GMT+0000 (UTC): Creating pool
ODPI [02958] 2019-07-04 07:06:55.162: fn start dpiContext_create
ODPI [02958] 2019-07-04 07:06:55.176: fn end dpiContext_create -> 0
ODPI [02964] 2019-07-04 07:06:55.176: fn start dpiContext_initPoolCreateParams(0x55d24731f7d0)
ODPI [02964] 2019-07-04 07:06:55.176: fn end dpiContext_initPoolCreateParams(0x55d24731f7d0) -> 0
ODPI [02964] 2019-07-04 07:06:55.176: fn start dpiContext_initCommonCreateParams(0x55d24731f7d0)
ODPI [02964] 2019-07-04 07:06:55.176: fn end dpiContext_initCommonCreateParams(0x55d24731f7d0) -> 0
ODPI [02964] 2019-07-04 07:06:55.177: fn start dpiPool_create(0x55d24731f7d0)
ODPI [02964] 2019-07-04 07:06:55.183: fn end dpiPool_create(0x55d24731f7d0) -> 0
ODPI [02964] 2019-07-04 07:06:55.183: fn start dpiPool_setTimeout(0x7f16a400c920)
ODPI [02964] 2019-07-04 07:06:55.183: fn end dpiPool_setTimeout(0x7f16a400c920) -> 0
ODPI [02964] 2019-07-04 07:06:55.183: fn start dpiPool_setStmtCacheSize(0x7f16a400c920)
ODPI [02964] 2019-07-04 07:06:55.183: fn end dpiPool_setStmtCacheSize(0x7f16a400c920) -> 0
Thu Jul 04 2019 07:06:55 GMT+0000 (UTC): Pool created
Server running at http://localhost:7000
Thu Jul 04 2019 07:07:11 GMT+0000 (UTC): Request received, getting connection
ODPI [02958] 2019-07-04 07:07:11.669: fn start dpiPool_addRef(0x7f16a400c920)
ODPI [02958] 2019-07-04 07:07:11.669: fn end dpiPool_addRef(0x7f16a400c920) -> 0
ODPI [02965] 2019-07-04 07:07:11.669: fn start dpiContext_initConnCreateParams(0x55d24731f7d0)
ODPI [02965] 2019-07-04 07:07:11.669: fn end dpiContext_initConnCreateParams(0x55d24731f7d0) -> 0
ODPI [02965] 2019-07-04 07:07:11.669: fn start dpiPool_acquireConnection(0x7f16a400c920)
ODPI [02965] 2019-07-04 07:07:12.690: fn end dpiPool_acquireConnection(0x7f16a400c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.690: fn start dpiPool_release(0x7f16a400c920)
ODPI [02958] 2019-07-04 07:07:12.690: fn end dpiPool_release(0x7f16a400c920) -> 0
Thu Jul 04 2019 07:07:12 GMT+0000 (UTC): connection received, executing query
ODPI [02958] 2019-07-04 07:07:12.691: fn start dpiConn_addRef(0x7f169c00c920)
ODPI [02958] 2019-07-04 07:07:12.691: fn end dpiConn_addRef(0x7f169c00c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.691: fn start dpiConn_newVar(0x7f169c00c920)
ODPI [02958] 2019-07-04 07:07:12.691: fn end dpiConn_newVar(0x7f169c00c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.691: fn start dpiVar_setFromBytes(0x55d2473563d0)
ODPI [02958] 2019-07-04 07:07:12.691: fn end dpiVar_setFromBytes(0x55d2473563d0) -> 0
ODPI [02966] 2019-07-04 07:07:12.691: fn start dpiConn_prepareStmt(0x7f169c00c920)
ODPI [02966] 2019-07-04 07:07:12.692: fn end dpiConn_prepareStmt(0x7f169c00c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.692: fn start dpiStmt_getInfo(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.692: fn end dpiStmt_getInfo(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.692: fn start dpiStmt_bindByPos(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.692: fn end dpiStmt_bindByPos(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.692: fn start dpiStmt_execute(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.815: fn end dpiStmt_execute(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.815: fn start dpiStmt_getQueryInfo(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.815: fn end dpiStmt_getQueryInfo(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.815: fn start dpiStmt_getQueryInfo(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.815: fn end dpiStmt_getQueryInfo(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.815: fn start dpiStmt_getQueryInfo(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.815: fn end dpiStmt_getQueryInfo(0x7f16a000c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.816: fn start dpiConn_release(0x7f169c00c920)
ODPI [02958] 2019-07-04 07:07:12.816: fn end dpiConn_release(0x7f169c00c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.816: fn start dpiVar_release(0x55d2473563d0)
ODPI [02958] 2019-07-04 07:07:12.816: fn end dpiVar_release(0x55d2473563d0) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiConn_newVar(0x7f169c00c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiConn_newVar(0x7f169c00c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiStmt_define(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiStmt_define(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiConn_newVar(0x7f169c00c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiConn_newVar(0x7f169c00c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiStmt_define(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiStmt_define(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiConn_newVar(0x7f169c00c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiConn_newVar(0x7f169c00c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiStmt_define(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.817: fn end dpiStmt_define(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.817: fn start dpiStmt_setFetchArraySize(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.817: fn end dpiStmt_setFetchArraySize(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.817: fn start dpiStmt_fetchRows(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.940: fn end dpiStmt_fetchRows(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.940: fn start dpiStmt_release(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.940: fn end dpiStmt_release(0x7f16a000c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.940: fn start dpiVar_release(0x7f16940435e0)
ODPI [02958] 2019-07-04 07:07:12.940: fn end dpiVar_release(0x7f16940435e0) -> 0
ODPI [02958] 2019-07-04 07:07:12.940: fn start dpiVar_release(0x7f1694041ce0)
ODPI [02958] 2019-07-04 07:07:12.940: fn end dpiVar_release(0x7f1694041ce0) -> 0
ODPI [02958] 2019-07-04 07:07:12.940: fn start dpiVar_release(0x7f169400c920)
ODPI [02958] 2019-07-04 07:07:12.940: fn end dpiVar_release(0x7f169400c920) -> 0
Thu Jul 04 2019 07:07:12 GMT+0000 (UTC): query executed, showing results
Thu Jul 04 2019 07:07:12 GMT+0000 (UTC): Closing connection
ODPI [02964] 2019-07-04 07:07:12.942: fn start dpiConn_close(0x7f169c00c920)
ODPI [02964] 2019-07-04 07:07:12.942: fn end dpiConn_close(0x7f169c00c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.942: fn start dpiConn_release(0x7f169c00c920)
ODPI [02958] 2019-07-04 07:07:12.942: fn end dpiConn_release(0x7f169c00c920) -> 0
Thu Jul 04 2019 07:07:12 GMT+0000 (UTC): Connection closed
Thu Jul 04 2019 07:12:56 GMT+0000 (UTC): Request received, getting connection
ODPI [02958] 2019-07-04 07:12:56.673: fn start dpiPool_addRef(0x7f16a400c920)
ODPI [02958] 2019-07-04 07:12:56.673: fn end dpiPool_addRef(0x7f16a400c920) -> 0
ODPI [02965] 2019-07-04 07:12:56.673: fn start dpiContext_initConnCreateParams(0x55d24731f7d0)
ODPI [02965] 2019-07-04 07:12:56.673: fn end dpiContext_initConnCreateParams(0x55d24731f7d0) -> 0
**ODPI [02965] 2019-07-04 07:12:56.673: fn start dpiPool_acquireConnection(0x7f16a400c920)
ODPI [02965] 2019-07-04 07:29:58.920: fn end dpiPool_acquireConnection(0x7f16a400c920) -> 0**
ODPI [02958] 2019-07-04 07:29:58.920: fn start dpiPool_release(0x7f16a400c920)
ODPI [02958] 2019-07-04 07:29:58.920: fn end dpiPool_release(0x7f16a400c920) -> 0
Thu Jul 04 2019 07:29:58 GMT+0000 (UTC): connection received, executing query
ODPI [02958] 2019-07-04 07:29:58.920: fn start dpiConn_addRef(0x7f169c1b3b90)
ODPI [02958] 2019-07-04 07:29:58.920: fn end dpiConn_addRef(0x7f169c1b3b90) -> 0
ODPI [02958] 2019-07-04 07:29:58.920: fn start dpiConn_newVar(0x7f169c1b3b90)
ODPI [02958] 2019-07-04 07:29:58.920: fn end dpiConn_newVar(0x7f169c1b3b90) -> 0
ODPI [02958] 2019-07-04 07:29:58.920: fn start dpiVar_setFromBytes(0x55d247352620)
ODPI [02958] 2019-07-04 07:29:58.920: fn end dpiVar_setFromBytes(0x55d247352620) -> 0
ODPI [02966] 2019-07-04 07:29:58.920: fn start dpiConn_prepareStmt(0x7f169c1b3b90)
ODPI [02966] 2019-07-04 07:29:58.920: fn end dpiConn_prepareStmt(0x7f169c1b3b90) -> 0
ODPI [02966] 2019-07-04 07:29:58.920: fn start dpiStmt_getInfo(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:58.920: fn end dpiStmt_getInfo(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:58.920: fn start dpiStmt_bindByPos(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:58.920: fn end dpiStmt_bindByPos(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:58.920: fn start dpiStmt_execute(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:59.049: fn end dpiStmt_execute(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:59.049: fn start dpiStmt_getQueryInfo(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:59.049: fn end dpiStmt_getQueryInfo(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:59.049: fn start dpiStmt_getQueryInfo(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:59.049: fn end dpiStmt_getQueryInfo(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:59.049: fn start dpiStmt_getQueryInfo(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:59.049: fn end dpiStmt_getQueryInfo(0x7f16a0040da0) -> 0
ODPI [02958] 2019-07-04 07:29:59.050: fn start dpiConn_release(0x7f169c1b3b90)
ODPI [02958] 2019-07-04 07:29:59.050: fn end dpiConn_release(0x7f169c1b3b90) -> 0
ODPI [02958] 2019-07-04 07:29:59.050: fn start dpiVar_release(0x55d247352620)
ODPI [02958] 2019-07-04 07:29:59.050: fn end dpiVar_release(0x55d247352620) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiConn_newVar(0x7f169c1b3b90)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiConn_newVar(0x7f169c1b3b90) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_define(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiStmt_define(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiConn_newVar(0x7f169c1b3b90)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiConn_newVar(0x7f169c1b3b90) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_define(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiStmt_define(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiConn_newVar(0x7f169c1b3b90)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiConn_newVar(0x7f169c1b3b90) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_define(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiStmt_define(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_setFetchArraySize(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiStmt_setFetchArraySize(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_fetchRows(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.179: fn end dpiStmt_fetchRows(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.179: fn start dpiStmt_release(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.179: fn end dpiStmt_release(0x7f16a0040da0) -> 0
ODPI [02958] 2019-07-04 07:29:59.179: fn start dpiVar_release(0x7f1694043f50)
ODPI [02958] 2019-07-04 07:29:59.179: fn end dpiVar_release(0x7f1694043f50) -> 0
ODPI [02958] 2019-07-04 07:29:59.179: fn start dpiVar_release(0x7f1694042000)
ODPI [02958] 2019-07-04 07:29:59.179: fn end dpiVar_release(0x7f1694042000) -> 0
ODPI [02958] 2019-07-04 07:29:59.179: fn start dpiVar_release(0x7f1694041da0)
ODPI [02958] 2019-07-04 07:29:59.179: fn end dpiVar_release(0x7f1694041da0) -> 0
Thu Jul 04 2019 07:29:59 GMT+0000 (UTC): query executed, showing results
Thu Jul 04 2019 07:29:59 GMT+0000 (UTC): Closing connection
ODPI [02964] 2019-07-04 07:29:59.180: fn start dpiConn_close(0x7f169c1b3b90)
ODPI [02964] 2019-07-04 07:29:59.180: fn end dpiConn_close(0x7f169c1b3b90) -> 0
ODPI [02958] 2019-07-04 07:29:59.180: fn start dpiConn_release(0x7f169c1b3b90)
ODPI [02958] 2019-07-04 07:29:59.180: fn end dpiConn_release(0x7f169c1b3b90) -> 0
Thu Jul 04 2019 07:29:59 GMT+0000 (UTC): Connection closed
^C
Terminating
ODPI [02965] 2019-07-04 07:43:55.127: fn start dpiPool_close(0x7f16a400c920)
ODPI [02965] 2019-07-04 08:00:59.373: fn end dpiPool_close(0x7f16a400c920) -> 0
ODPI [02958] 2019-07-04 08:00:59.373: fn start dpiPool_release(0x7f16a400c920)
ODPI [02958] 2019-07-04 08:00:59.373: fn end dpiPool_release(0x7f16a400c920) -> 0
Pool closed

I'm pretty sure enabling TCP keep alive packets can fix this problem. But none of the options in the library seem to enable them. Maybe we should continue the focus on that?

cjbj commented 5 years ago

I agree the evidence looks like some network issue. Contact the Azure folk and see what they say about their part of the stack.

One thing to look at is an Oracle Net trace. In sqlnet.ora add something like:

ADR_BASE=/tmp/mylogs
DIAG_ADR_ENABLED=ON
TRACE_LEVEL_CLIENT=16

Then make the log directory: mkdir /tmp/mylogs.

Then run the app.

Look for what's happening at the time of the hang. We don't need full logs. And use a gist for output!

There's no crash with webappasync.js?

hwesley commented 5 years ago

Azure closes outbound idle connections on VM's after 4 minutes. This setting can not be changed. We've had contact with Azure before concerning this issue. One of the things we noticed is that they don't send a TCP RST package when they close a connection. Unfortunately this is done so by design and can not be changed.

Because of that, I think we can only fix this by keeping the connection alive (which is our goal), so there should be a way of sending TCP keep alive packets over all open connections in the pool. As far as I understand this is possible with the node-oracle library, but I can't seem to get this working.

I've changed the sqlnet.ora to above configuration. You can find the results here: https://we.tl/t-ZHEeyCVAiN

dmcghan commented 5 years ago

I don't like it, but here's something I just mocked up that might help (would require a fixed pool size).

setInterval(async () => {
  const pool = oracledb.getPool();
  const conns = [];

  // First take all of the connections out of the pool.
  for (let idx = 0; idx < pool.connectionsOpen; idx += 1) {
    const conn = await pool.getConnection();

    conns.push(conn);
  }

  // Then ping and close each connection.
  for (let idx = 0; idx < conns.length; idx += 1) {
    try {
      await conns[idx].ping();
    } catch (err) {
      console.error(err);
    } finally {
      try {
        await conns[idx].close();
      } catch (err) {
        console.log(err);
      }
    }
  }
}, 120000);
hwesley commented 5 years ago

That would be a possibility indeed, but isn't that something that the poolPingInterval attribute is supposed to do?

And what about the previously mentioned EXPIRE_TIME, shouldn't that send a keep alive packet every X minutes?

When a socket is opened with SO_KEEPALIVE, the configured tcp keep alive settings would be used and TCP keep alive packets get sent automatically at the specified interval. Is there a settings in the oracle client that enables this behaviour?

dmcghan commented 5 years ago

but isn't that something that the poolPingInterval attribute is supposed to do?

As far as I know, poolPingInterval works when you invoke getConnection. It's not working in the background.

And what about the previously mentioned EXPIRE_TIME

This seems like the best option, though now I'm wondering if it suffers from the same issue as poolPingInterval. Have you tries setting using this on the database server instead of the client?

Is there a settings in the oracle client that enables this behaviour?

This white paper indicates that dead connection detection in 12c+ uses the TCP layer (perhaps SO_KEEPALIVE is part of that solution): https://www.oracle.com/technetwork/database/enterprise-edition/oraclenetdcd-2179641.pdf

What version of the database are you connecting to?

cjbj commented 5 years ago

@hwesley it seems that "Oracle" is in discussion with "Microsoft" about Azure timeouts.

Setting keepalive has been found effective in breaking the 4-minute barrier for apps in Azure VM connecting to an Oracle Cloud ATP-S DB.

One thing I got corrected on today was that on clients, EXPIRE_TIME currently needs to be set in the connect descriptor, not sqlnet.ora (this may change in the future). This is true from 18c on. But the easiest way to use it is to update to 19c client libraries and utilize the new 'Easy Connect Plus' syntax:

connectionString: `${process.env.DB_HOST}:${process.env.DB_PORT}/cdb1?expire_time=1`

Although not documented (!), you can use expire_time like this. Give it a shot and let us know. If you can't upgrade from 18c, create a tnsnames.ora entry with (expire_time=1) in the DESCRIPTION section.

You may also be able to change the outbound reset time in Azure, see https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-rules-overview Apparently the limit is not the documented '66' but actually 120. This hasn't been tested, to my knowledge.

cjbj commented 5 years ago

@hwesley any news on this?

hwesley commented 5 years ago

Hi @cjbj ,

Thanks for the input. I haven't been able to continue on this. I'll try to pick it up next week!

Kind regards

hwesley commented 5 years ago

hi @cjbj ,

Good news! I updated the oracle sample application to version 19.3 of the client libraries. I added the expire_time to the connection string and the parameter works! I don't see any timeouts.

We haven't tried the suggestions with a load balancer, as we are also in the process of moving to k8s. I'm not sure if we are going to have the same problems there, but now with the expire_time we can be sure the connections are kept alive!

cjbj commented 5 years ago

@hwesley thanks for the feedback. I'm glad you have a solution. I'll close this now