Closed Hash17677 closed 3 months ago
Thank you for using node-oracledb.
Can you try enabling Thick mode and see if the problem is reproduced.
I try the Thick mode as well. But it does not fix the error. @sharadraju
A "hang" like you are reporting suggests that connections are not being returned to the pool. You can examine the pool statistics to see whether that is indeed the case. documentation.
@Hash17677 as I mentioned in your original stackoverflow post, as well as checking pool stats, also see Always Use Connection Pools — and How.
I monitor the statistics of the pool and notice connectionsOpen decrease the amount gradually. As I mentioned above I set poolMin to 5. But I notice connectionsOpen decrease the value under the poolmin. But It never increase.
@cjbj @anthony-tuininga
@Hash17677 Is there any sign of queuing?
No @cjbj I have not notice any queuing in the stats.
@Hash17677 If you would like to share some stats and/or a reproducible testcase then I might be able to help you. Maybe it isn't connection exhaustion but is some DB lock in your app - you'll need to do some debugging.
If the code snippet you shared is actually what you are using, check your scope for oracleConnApps
to make sure the close() is being called.
I have same problem. Call ping
inside getConnection
hangs 11-16 minutes and catches ETIMEDOUT inside net stream.
stream error:
Error: read ETIMEDOUT
at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {
errno: -110,
code: 'ETIMEDOUT',
syscall: 'read'
}
NJS-500: connection to the Oracle Database was broken
NJS-501: connection to host 172.20.188.174 port 1521 terminated unexpectedly. (CONNECTION_ID=UpVDSqbkhcGXyrjvZjKU5A==)
read ETIMEDOUT
at NTTCP.checkErr (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:327:29)
at NTTCP.receive (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:465:12)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async NetworkSession._recvPacket (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:378:22)
at async NetworkSession.recvPacket (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:443:12)
at async ReadPacket.waitForPackets (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/packet.js:293:20)
at async Protocol._decodeMessage (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/protocol.js:70:5)
at async Protocol._processMessage (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/protocol.js:153:9)
at async ThinConnectionImpl.ping (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/connection.js:977:5)
at async ThinPoolImpl.acquire (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/pool.js:505:11)
TCP/IP connections may be broken by router/switch (restart).
Why ping not working periodically (pingInterval) for free connections and not clean failure connections?
In log i see errors only after call getConnection
. Nothing for periodically pings? Why not working keepAlive for tcp connection?
Statistics before call getConnection
:
Driver:
...thin mode: true
Pool statistics:
...gathered at: 2023-12-21T07:04:55.978Z
...up time (milliseconds): 73507015
...up time from last reset (milliseconds) 73506988
...connection requests: 10
...requests enqueued: 0
...requests dequeued: 0
...requests failed: 0
...requests exceeding queueMax: 0
...requests exceeding queueTimeout: 0
...current queue length: 0
...maximum queue length: 0
...sum of time in queue (milliseconds): 0
...minimum time in queue (milliseconds): 0
...maximum time in queue (milliseconds): 0
...average time in queue (milliseconds): 0
...pool connections in use: 0
...pool connections open: 2
Pool attributes:
...connectString: BANK
...edition:
...events: false
...externalAuth: false
...homogeneous: true
...poolAlias: WORKER-2
...poolIncrement: 1
...poolMax: 5
...poolMaxPerShard: undefined
...poolMin: 2
...poolPingInterval (seconds): 30
...poolTimeout (seconds): 60
...queueMax: 500
...queueTimeout (milliseconds): 60000
...sessionCallback: undefined
...sodaMetaDataCache: undefined
...stmtCacheSize: 30
...user: USER
Related environment variables:
...UV_THREADPOOL_SIZE: 10
In project i use pkg
and compile it with node16-linux-x64
target (node.js v16).
Same problem here, using version 6.0.0 and docker container running on custom VLAN. We are catching ORA-03113 on Sentry on previous requests. Once it hangs, the node app doesn't respond anymore. I'll be updating on this topic once we run some stress test, trying to bring down the network connection with different methods.
NJS-500: connection to the Oracle Database was broken DPI-1080: connection was closed by ORA-3113
Maybe I found solution.
Article https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die Part "Busy ESTAB socket is not forever" This is my situation. Keep alive working, but on Linux default value of TCP_KEEPALIVE_TIME = 7300 (>120 minutes). When socket broken and ping calls earlier then first keep alive after break, socket hangs ~16 minutes.
In article used parameter TCP_USER_TIMEOUT. This is analog of net.socket.setTimeout()
?
What about set timeout of connection stream manually from connection options, or set timeout as TCP_USER_TIMEOUT = TCP_KEEPIDLE + TCP_KEEPINTVL TCP_KEEPCNT where [TCP_KEEPINTVL TCP_KEEPCNT = 1 * 10](https://nodejs.org/api/net.html#socketsetkeepaliveenable-initialdelay)
P.S. I set connection option expireTime
to 1 minute and then in log i see droping of broken connections before call getConnection
.
But if ping calls within period before keep alive (1 minute), ping packet sends 16 minutes and then timeout (after send packet to socket, keep alive not working). Needs use setTimeout
@boriborm setTimeout may timeout idle but valid connections. I think the correct solution would be a combination of expireTime and callTimeout (https://node-oracledb.readthedocs.io/en/latest/api_manual/connection.html#connection.callTimeout)
expireTime would timeout idle connections in the pool before it is acquired. If the connection is acquired before the keep alive idle timeout, then callTimeout would kick in and timeout the connection.
@sreguna callTimeout is not variant. It sets timeout to send breaking packet. Sending breaking packet is the same as sending ping packet. Hangs ~16 minutes on TCP/IP level.
async _processMessage(message) {
let callTimer;
let callTimeoutExpired = false;
try {
if (this.callTimeout > 0) {
callTimer = setTimeout(() => {
callTimeoutExpired = true;
this.breakMessage(); <-----
}, this.callTimeout);
}
await this._encodeMessage(message);
...
breakMessage() {
this._breakInProgress = true;
this.nsi.sendBreak(); <----
}
sendBreak() {
if (this.isBreak)
return; /* Already in a break */
if (!this.connected) {
this.isBreak = true; /* Not yet connected. Post the break */
this.breakPosted = true;
return;
}
this.isBreak = true;
this.markerPkt.prepare(constants.NSPMKTD1, constants.NIQIMARK);
this._sendPacket(this.markerPkt.buf); <----
}
I think simple solution is use timeout only for ping. If ping timed out, break streaming of ping packet, force closing of stream and invalidate connection.
@sreguna callTimeout is not variant. It sets timeout to send breaking packet. Sending breaking packet is the same as sending ping packet. Hangs ~16 minutes on TCP/IP level.
async _processMessage(message) { let callTimer; let callTimeoutExpired = false; try { if (this.callTimeout > 0) { callTimer = setTimeout(() => { callTimeoutExpired = true; this.breakMessage(); <----- }, this.callTimeout); } await this._encodeMessage(message); ...
breakMessage() { this._breakInProgress = true; this.nsi.sendBreak(); <---- }
sendBreak() { if (this.isBreak) return; /* Already in a break */ if (!this.connected) { this.isBreak = true; /* Not yet connected. Post the break */ this.breakPosted = true; return; } this.isBreak = true; this.markerPkt.prepare(constants.NSPMKTD1, constants.NIQIMARK); this._sendPacket(this.markerPkt.buf); <---- }
I think simple solution is use timeout only for ping. If ping timed out, break streaming of ping packet, force closing of stream and invalidate connection.
The hang is consistent? because I've been having this issue and when it happens it maybe 16 minutes or hangs forever
I think after setting expireTime
(i.e net socket setKeepAlive
) , indefinite hang is not seen .
I have same problem. Call
ping
insidegetConnection
hangs 11-16 minutes and catches ETIMEDOUT inside net stream.stream error: Error: read ETIMEDOUT at TCP.onStreamRead (node:internal/stream_base_commons:217:20) { errno: -110, code: 'ETIMEDOUT', syscall: 'read' } NJS-500: connection to the Oracle Database was broken NJS-501: connection to host 172.20.188.174 port 1521 terminated unexpectedly. (CONNECTION_ID=UpVDSqbkhcGXyrjvZjKU5A==) read ETIMEDOUT at NTTCP.checkErr (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:327:29) at NTTCP.receive (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:465:12) at processTicksAndRejections (node:internal/process/task_queues:96:5) at async NetworkSession._recvPacket (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:378:22) at async NetworkSession.recvPacket (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:443:12) at async ReadPacket.waitForPackets (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/packet.js:293:20) at async Protocol._decodeMessage (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/protocol.js:70:5) at async Protocol._processMessage (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/protocol.js:153:9) at async ThinConnectionImpl.ping (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/connection.js:977:5) at async ThinPoolImpl.acquire (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/pool.js:505:11)
TCP/IP connections may be broken by router/switch (restart). Why ping not working periodically (pingInterval) for free connections and not clean failure connections? In log i see errors only after call
getConnection
. Nothing for periodically pings? Why not working keepAlive for tcp connection?Statistics before call
getConnection
:Driver: ...thin mode: true Pool statistics: ...gathered at: 2023-12-21T07:04:55.978Z ...up time (milliseconds): 73507015 ...up time from last reset (milliseconds) 73506988 ...connection requests: 10 ...requests enqueued: 0 ...requests dequeued: 0 ...requests failed: 0 ...requests exceeding queueMax: 0 ...requests exceeding queueTimeout: 0 ...current queue length: 0 ...maximum queue length: 0 ...sum of time in queue (milliseconds): 0 ...minimum time in queue (milliseconds): 0 ...maximum time in queue (milliseconds): 0 ...average time in queue (milliseconds): 0 ...pool connections in use: 0 ...pool connections open: 2 Pool attributes: ...connectString: BANK ...edition: ...events: false ...externalAuth: false ...homogeneous: true ...poolAlias: WORKER-2 ...poolIncrement: 1 ...poolMax: 5 ...poolMaxPerShard: undefined ...poolMin: 2 ...poolPingInterval (seconds): 30 ...poolTimeout (seconds): 60 ...queueMax: 500 ...queueTimeout (milliseconds): 60000 ...sessionCallback: undefined ...sodaMetaDataCache: undefined ...stmtCacheSize: 30 ...user: USER Related environment variables: ...UV_THREADPOOL_SIZE: 10
In project i use
pkg
and compile it withnode16-linux-x64
target (node.js v16).
just to clarify, currently periodic cleanup of free connections only removes the connections that are idle more thanpoolTimeout
configured value. poolPingInterval
is checked during getConnection as it involves round trip. expireTime
should enable keepAlive.
@boriborm, Thanks for suggestion of pingTimeout in comment. I am checking the feasibility of the same.
Same problem here, using version 6.0.0 and docker container running on custom VLAN. We are catching ORA-03113 on Sentry on previous requests. Once it hangs, the node app doesn't respond anymore. I'll be updating on this topic once we run some stress test, trying to bring down the network connection with different methods.
NJS-500: connection to the Oracle Database was broken DPI-1080: connection was closed by ORA-3113
Thick mode has SQLNET.RECV_TIMEOUT to limit the time for executes . details here
We have added poolPingTimeout parameter in the patch here. This will be included in the node-oracledb 6.4 release.
This has fixed in node-oracledb 6.4.
I have created the express server application which connect to oracle database in order to perform DB operations. That application works fine in dev environment and then I published it in ubuntu VM using pm2. Initially it works fine but after few hours getConnection('hr') method hang the request.
Inside application, First I create the oracle db connection pool at the startup of the express app. Here is the code for connection pool.
Here is the endpoint that use oracle connection.
I used oracledb: 6.2.0 version and thin mode to connect DB.
Initially it works fine. After few hours getConnection method hang the server and it does not trigger the catch block. If I restart node app using pm2 then it works. But same error happen after few hours.