oracle / node-oracledb

Oracle Database driver for Node.js maintained by Oracle Corp.
http://oracle.github.io/node-oracledb/
Other
2.24k stars 1.07k forks source link

DPI-1080: connection was closed by ORA-2396, #1545

Closed cjbj closed 1 year ago

cjbj commented 1 year ago

Hello @cjbj,

I am facing the same issue on linux machine - DPI-1080: connection was closed by ORA-2396, "oracledb": "^5.4.0"

connection settings -> poolMax: 25, poolMin: 5, poolIncrement: 5, poolTimeout: 1800, poolPingInterval: 300

One thing, I wanted to understand, if we have poolMin as 5, poolTimeout as 1800, and poolPingInterval 300, in documentation, it's written as after poolTimeout seconds it will kill ideal connections and bring them back to poolMin but what about the connections it's holding as poolMin, will it kill those as well and recreates new to match poolMin?

Our Oracle has a setting to kill connections that are ideal for more than an hour, that's why we kept the poolTimeout as half of their time but still connections went to this error state.

Originally posted by @jatinseth2007 in https://github.com/oracle/node-oracledb/issues/1493#issuecomment-1506985893

cjbj commented 1 year ago

@jatinseth2007:

jatinseth2007 commented 1 year ago

thanks @cjbj for reply, I have added inline comments below

  1. When exactly do you get the error? After what call? What was the life of the connection before this - was it released to the pool or had the app been using it and then sitting idle without calling connection.close()?

-> This happened when we released the connection back to the pool and it was there in the pool idle state, ORACLE killed that connection and the connection was waiting for poolPingInterval: 300 to be checked and cleared.

  1. Tell your DBAs to stop killing idle connections. Tell them that it just creates more DB load when the app has to recreate the connections. Despite all the other node-oracledb and SQL*Net settings, unless they stop the expiry you might still occasionally see connection use errors.

-> Yes, that is the solution I am going to propose to my DB team

  1. Do you really need a dynamic pool? See the recommendation in the doc about using a fixed size pool.

-> Yes, I am aware of this dynamic pool, but we ruled it out cuz it would keep a lot of active connections even in non-peak hours.

  1. Maybe use DRCP if your DBAs think that you are using too much server host memory.

-> This is new to me, let me think through it and discuss it with the DB team.

  1. poolTimeout will shrink the pool to poolMin but not touch any remaining connections.

-> This is very useful information, I was not aware of, so, poolTimeout will shrink the pool to poolMin and then the connections are relied on poolPingInterval to be cleared out if in an error state.

  1. Note the comment in the doc about 21c vs older Oracle Client and pool shrinkage initiation.

-> Sure, I will think through it

  1. If you have released connections to the pool and they have been idle (not "checked out" with getConnection()) for poolPingInterval seconds then the next getConnection() should do a round-trip to the DB and see there was an error, then throw away the connection and return a different one from getConnection().

-> This didn't happen in our case, below is the code we are using to getConnection and release it. according to the above statement, instead of throwing an error it should have checked the connection and picked a different one in case of error.

// fetch connection from pool
const connection = await oracledb.getPool().getConnection();

await connection.close({ drop: false }); // this happens even in error case as well.
  1. The other common connection-kill scenario is firewall time-outs. Your error doesn't indicate you have this problem, but if you did then you would use a sqlnet.ora EXPIRE_TIME setting to send networks pings and stop connections being killed. (I mention this for other, future readers)

-> Yes, for us, the firewall is not the issue, let me go through it.

cjbj commented 1 year ago

-> This happened when we released the connection back to the pool and it was there in the pool idle state, ORACLE killed that connection and the connection was waiting for poolPingInterval: 300 to be checked and cleared.

What exact call gave the error? How long was the connection idle in the pool? Does this happen for all connections? What's the DB idle_time value?

jatinseth2007 commented 1 year ago

@cjbj -> It was a call from UI that tried to connect to DB to fetch data and gave this error. -> not sure about the idle time of connection in the pool -> No, it didn't happen for all connection -> It is 300 mins -> The main thing is the below statement didn't happen in our particular case, ideally, instead of throwing an error, it should have created a new connection.

"If you have released connections to the pool and they have been idle (not "checked out" with getConnection()) for poolPingInterval seconds then the next getConnection() should do a round-trip to the DB and see there was an error, then throw away the connection and return a different one from getConnection()."

cjbj commented 1 year ago

-> not sure about the idle time of connection in the pool

This is the key piece of information. Earlier I mentioned that the driver can't always guarantee to return a usable connection. This is because the relative timing of the DB session idle timeout and the driver ping interval may mean that the DB has killed a session, but the app doesn't think it's time to ping.

To qualify my assertion about the guarantee, you could set poolPingInterval to 0 so that every getConnection() does a round trip to the DB. However this is bad for performance & scalability. You really want to stop the DBA's setting an idle time.

Note that even with poolPingInterval set to 0, if the connection is killed (by idle time, or network glitch or anything) in the instant between getConnection() and execute() you will still get an error. Your application should be coded to handle this execution error and do an appropriate retry or recover.

With my small knowledge about your configuration and app design I can't explain why with your large idle time and smaller ping interval you are seeing the error. Maybe there is something going on that I don't know. What are your DB and Oracle Client versions?

cjbj commented 1 year ago

Closing - no activity.