oracle / node-oracledb

Oracle Database driver for Node.js maintained by Oracle Corp.
http://oracle.github.io/node-oracledb/
Other
2.24k stars 1.07k forks source link

Uncaught expection when a connection in a connection pool is unhealthy #1604

Closed JanneSalo closed 9 months ago

JanneSalo commented 10 months ago
  1. What versions are you using?

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

process.platform: `'linux'`
process.version: `'v18.17.1'`
process.arch: `'x64'`
require('oracledb').versionString: `'6.1.0'`
require('oracledb').oracleClientVersionString: `undefined`
  1. Is it an error or a hang or a crash?

Error

  1. What error(s) or behavior you are seeing?

If the acquired connection is not healthy, it's removed from the pool here. However, removing the connection from the pool may cause an exception to be thrown but not caught anywhere. It eventually ends up in our process.on('uncaughtException') handler. Since this expection is not thrown immediately rather than on next tick when the event handler for _removePoolConnection is run, it's hard for us catch these in any other way. Should the event handler for _removePoolConnection maybe handle exceptions in some way or should unhealthy connection be dropped in some other way than emitting _removePoolConnection event? Note that the same issue affects also other places where _removePoolConnection event is emitted, such as when idle connections are dropped from the pool.

Full stack trace:

Error: NJS-500: connection to the Oracle Database was broken
NJS-501: connection to host x.x.x.x port yyyy terminated unexpectedly. (CONNECTION_ID=...)
read ETIMEDOUT
    at NTTCP.checkErr (node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:327:29)
    at NTTCP.send (node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:339:10)
    at WritePacket._sendPacket (node_modules/oracledb/lib/thin/protocol/packet.js:481:24)
    at WritePacket.endRequest (node_modules/oracledb/lib/thin/protocol/packet.js:507:12)
    at Protocol._encodeMessage (node_modules/oracledb/lib/thin/protocol/protocol.js:111:19)
    at Protocol._processMessage (node_modules/oracledb/lib/thin/protocol/protocol.js:151:18)
    at ThinConnectionImpl.close (node_modules/oracledb/lib/thin/connection.js:76:30)
    at ThinPoolImpl._destroy (node_modules/oracledb/lib/thin/pool.js:235:24)
    at EventEmitter.<anonymous> (node_modules/oracledb/lib/thin/pool.js:109:18)
    at EventEmitter.emit (node:events:514:28)
  code: 'NJS-500'
}

The reason why the connections break in the first place is at our end, caused probably by firewalls terminating idle connections. Still, we'd like to be sure there are no suprising consequences of connections going unhealthy. We have experienced issues in this particular environment also with oracledb 5.5.0, but since it probably handles unhealthy connections on a lower level, we don't experience any uncaught exceptions. Timeouts do occur on oracledb 5.5.0 but our code is able to handle them gracefully. We are, however, able to mitigate the issue simply by setting expireTime: 1 (probably bigger values would also work) for the connection pool to ensure the connections stay healthy.

  1. Include a runnable Node.js script that shows the problem.

Unfortunately we cannot provide this, since the underlying problem of connections being dropped is specific to our environment. The problem starts occurring after our application has been running for some time. We use a fixed pool size of 4 (i.e. poolMin = poolMax) so idle connection scanning should not happen.

sharadraju commented 10 months ago

Thanks for reporting this. However, can you try running this in Thick mode for a confirmation that nothing has changed since oracledb 5.5 when we use Oracle Client libraries?

jukkhop commented 10 months ago

@sharadraju FYI @JanneSalo is going to be offline for the next few weeks, but we'll run thick mode over the weekend and I'll report back on Monday.

jukkhop commented 10 months ago

@sharadraju I can confirm that the issue does not appear when running thick mode, ie. situation is similar if not identical to before with oracledb 5.5.

sharadraju commented 10 months ago

Thanks @jukkhop We are internally working on this.

ZachHaber commented 10 months ago

I'm also getting this error on thin-mode. It doesn't happen on thick mode, but seems to occur sometimes when there's network instability, it always happens when I boot my computer from sleep mode. I'll include one of the stack traces I had, since it's a little different as it occurred randomly:

Error: NJS-500: connection to the Oracle Database was broken
NJS-521: connection to host 10.72.31.62 port 1521 received end-of-file on communication channel. (CONNECTION_ID=ttHkW/zz/7p0vsv1gVK8sw==)
    at NTTCP.checkErr (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:327:29)
    at NTTCP.receive (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:463:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at NetworkSession._recvPacket (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:384:22)
    at NetworkSession.recvPacket (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:449:12)
    at ReadPacket.waitForPackets (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/protocol/packet.js:292:20)
    at Protocol._decodeMessage (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/protocol/protocol.js:70:5)
    at Protocol._processMessage (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/protocol/protocol.js:153:9)
    at ThinConnectionImpl.close (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/connection.js:76:9)
    at ThinPoolImpl._destroy (/home/haberzj/code/projects/arcade/node_modules/oracledb/lib/thin/pool.js:235:7) {
  code: 'NJS-500'
}

Node.js v18.17.0
cjbj commented 10 months ago

@ZachHaber it is likely related. We have discussed desired behavior and how the fix should be structured. A patch is under development.

sharadraju commented 9 months ago

Closing this as fixed in 6.2 release

StanislavKharchenko commented 8 months ago

@sharadraju Looks like the issue was not fixed. With 6.2 version I have the same error as provided under this issue https://github.com/oracle/node-oracledb/issues/1604#issuecomment-1719530634

Error: NJS-500: connection to the Oracle Database was broken
NJS-521: connection to host 10.98.36.48 port 1522 received end-of-file on communication channel. (CONNECTION_ID=nTqmyflDJS6dGon6i8aFGQ==)
    at NTTCP.checkErr (C:\Teamcity\work\**\node_modules\oracledb\lib\thin\sqlnet\ntTcp.js:327:29)
    at NTTCP.receive (C:\Teamcity\work\**\node_modules\oracledb\lib\thin\sqlnet\ntTcp.js:463:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async NetworkSession._recvPacket (C:\Teamcity\work\**\node_modules\oracledb\lib\thin\sqlnet\networkSession.js:378:22)
    at async NetworkSession.recvPacket (C:\Teamcity\work\**\node_modules\oracledb\lib\thin\sqlnet\networkSession.js:443:12)
    at async ReadPacket.waitForPackets (C:\Teamcity\work\**\node_modules\oracledb\lib\thin\protocol\packet.js:293:20)
    at async Protocol._decodeMessage (C:\Teamcity\work\**\node_modules\oracledb\lib\thin\protocol\protocol.js:70:5)
    at async Protocol._processMessage (C:\Teamcity\work\**\node_modules\oracledb\lib\thin\protocol\protocol.js:153:9)
    at async ThinConnectionImpl.connect (C:\Teamcity\work\**\node_modules\oracledb\lib\thin\connection.js:682:9)
    at async Object.getConnection (C:\Teamcity\work\**\node_modules\oracledb\lib\oracledb.js:642:3) {
  code: 'NJS-500'
}

Data to reproduce: oracledb 6.2.0 NodeJS: 18.15.0

It not reproduced with oracledb 6.0.3

sudarshan12s commented 8 months ago

@StanislavKharchenko , I see the stack trace indicates the error is thrown when application calls getConnection API. The GH issue here is more on exception received during close of session. Can you please verify if NNE is enabled on server same as issue reported here .

StanislavKharchenko commented 8 months ago

@sudarshan12s Looks like yes, we're using NNE. As I understand - we need to use thick mode in such case. Thank you.