Closed hwesley closed 5 years ago
It would be helpful if you could show us the code.
@hwesley what are you trying to protect against and trying to do? Why did you determine you need to close the pool instead of just letting the pool establish new connections? Do you have a standby database? Do you have RAC? Even if not, can you create a FAN service (for the DB / Oracle Net) and enable events
in node-oracledb? What sqlnet.ora parameters do you have in your Node servers - see Connections and High Availability ? Do you have SQLNET.EXPIRE_TIME in the DB sqlnet.ora?
Are you terminating the pool with a specified drainTime
?
If you're using the await/async style of programming, triple-check you have used 'await' appropriately - not doing this can commonly lead to crashes.
The code can be started by calling:
process.env.UV_THREADPOOL_SIZE = 64;
await dbConnection.createPool(POOL);
await dbConnection.createPool(POOL_EDIT);
Here are the necessary files: DBConnector.js
import oracledb from 'oracledb';
import { Pinger } from '../../utils/Pinger';
import { timeoutPromise, sleep } from '../../utils/PromiseUtils';
oracledb.fetchAsString = [oracledb.CLOB];
const POOL = 'pool';
const POOL_EDIT = 'poolEdit';
const DEFAULT_TIMEOUT = 15000;
const POOL_CONFIG = {
connectionString: `${process.env.DB_HOST}:${process.env.DB_PORT}/cdb1`,
password: process.env.DB_PASSWORD,
user: process.env.DB_USER,
_enableStats: true,
poolMin: 0,
poolIncrement: 1,
poolPingInterval: 60,
poolTimeout: 30,
};
class DBConnector {
constructor() {
this.createPool = this.createPool.bind(this);
this.closePool = this.closePool.bind(this);
this.recreatePools = this.recreatePools.bind(this);
this.recreatePool = this.recreatePool.bind(this);
this.verifyPoolSanity = this.verifyPoolSanity.bind(this);
this.checkStatus = this.checkStatus.bind(this);
this.ping = this.ping.bind(this);
this.executeQuery = this.executeQuery.bind(this);
this.closeConnection = this.closeConnection.bind(this);
this.logError = this.logError.bind(this);
const pinger = new Pinger('Oracle DB', this.ping, '*/2 * * * *');
pinger.onPingFailed = async () => {
await this.recreatePools();
};
}
/**
* Initializes a connection pool to an Oracle db
* @param {String} poolAlias Name of the pool to init
*/
async createPool(poolAlias) {
return new Promise(async (resolve, reject) => {
let config;
switch (poolAlias) {
case POOL:
config = { ...POOL_CONFIG, poolAlias, poolMax: 60 };
break;
case POOL_EDIT:
config = { ...POOL_CONFIG, poolAlias, poolMax: 4 };
break;
}
if (!config) {
throw new Error(`No config available for poolAlias: ${poolAlias}`);
}
await oracledb.createPool(config, (error, pool) => {
if (error) {
console.log(`Create pool error: ${poolAlias}`, error);
reject();
}
console.log(`Finished creating pool: ${poolAlias}`);
resolve();
});
});
}
/**
* Closes the connection pool
* @param {String} poolAlias The pool to close
*/
async closePool(poolAlias) {
return new Promise(async (resolve, reject) => {
console.log(`Destroy pool: ${poolAlias}`);
const pool = await oracledb.getPool(poolAlias);
//await pool.close(error => {
await pool.terminate(error => {
if (error) {
console.log(`Error in close pool: ${poolAlias}`, error);
reject(error);
}
console.log(`Finished destroying pool: ${poolAlias}`);
resolve();
});
});
}
/**
* Closes the pools and creates them again
*/
async recreatePools() {
await this.recreatePool(POOL);
await this.recreatePool(POOL_EDIT);
}
async recreatePool(poolAlias) {
// recreate pool
console.log(`Recreating pool: ${poolAlias}`);
// destroy pool
try {
await this.closePool(poolAlias);
} catch (err) {
console.log(`Error destroying pool: ${poolAlias}`, err);
}
try {
await this.createPool(poolAlias);
} catch (err) {
console.log(`error creating pool: ${poolAlias}`, err);
}
console.log(`Finished recreating pool: ${poolAlias}`);
}
async verifyPoolSanity() {
const poolStatus = await this.checkStatus();
if (poolStatus.selectPool.connectionsInUse >= 60) {
const statusString = `Select pool is full: connectionsInUse: ${
poolStatus.selectPool.connectionsInUse
}; connectionsOpen: ${poolStatus.selectPool.connectionsOpen}`;
logException(new Error(statusString));
console.log(statusString);
}
if (poolStatus.editPool.connectionsInUse >= 60) {
const statusString = `Select pool is full: connectionsInUse: ${
poolStatus.editPool.connectionsInUse
}; connectionsOpen: ${poolStatus.editPool.connectionsOpen}`;
logException(new Error(statusString));
console.log(statusString);
}
}
async ping() {
const response = await this.executeQuery(`SELECT 1 FROM DUAL`);
if (!response.result || response.result.length == 0) {
throw new Error('Pinging failed');
}
await this.verifyPoolSanity();
}
/**
* Checks status of database
*/
async checkStatus() {
const result = {};
try {
const pool = await getPoolStatus(POOL);
result.selectPool = pool;
} catch (err) {
result.selectPool = {
isConnected: false,
};
}
try {
const pool = await getPoolStatus(POOL_EDIT);
result.editPool = pool;
} catch (err) {
result.editPool = {
isConnected: false,
};
}
return result;
}
async getPoolStatus(poolAlias) {
// select pool
const connection = await timeoutPromise(DEFAULT_TIMEOUT, oracledb.getConnection({ poolAlias }));
const connectionStatus = connection ? true : false;
if (connection) {
this.closeConnection(connection);
}
const pool = oracledb.getPool(poolAlias);
return {
isConnected: connectionStatus,
connectionsInUse: pool ? pool.connectionsInUse : 0,
connectionsOpen: pool ? pool.connectionsOpen : 0,
};
}
/**
* Executes a query on a connection from the db pool
* @param {String} query The query to execute
* @param {Array} vars An array of vars to fill the query
*/
async executeQuery(query, vars) {
let connection;
try {
connection = await timeoutPromise(DEFAULT_TIMEOUT, oracledb.getConnection({ POOL }));
if (connection) {
const result = await timeoutPromise(
DEFAULT_TIMEOUT,
connection.execute(query, vars || [], {
outFormat: oracledb.OBJECT,
maxRows: 1000,
})
);
if (result && result.rows)
return {
success: true,
result: result.rows || result,
};
return {
success: true,
result: null,
};
} else {
console.log('No valid connection, retry method');
return this.executeQuery(query, vars);
}
} catch (error) {
this.logError(error, poolAlias);
return {
success: false,
error: error,
};
} finally {
await this.closeConnection(connection);
}
}
/**
* Closes the given db connection an releases it to the connection pool
* @param {Connection} connection The Db connection
*/
async closeConnection(connection) {
try {
if (connection) {
await connection.close();
}
} catch (err) {
logException(err);
console.log('Error closing connection', err);
}
}
/**
* Logs an error and important metadata about the pool
* @param {Object} error Error caught by system
* @param {String} poolAlias The pool throwing the error
*/
logError(error, poolAlias) {
console.log(`Error connecting to db with ${poolAlias} pool:`, error);
}
}
exports.dbConnection = new DBConnector();
exports.POOL = POOL;
exports.POOL_EDIT = POOL_EDIT;
Pinger.js
import { CronJob } from 'cron';
export class Pinger {
constructor(name, pingFunction, cronExpression = '*/2 * * * *') {
this.name = name;
this.pingFunction = pingFunction;
// Callbacks
this.onPingFailed = null;
// Public methods
this.ping = this.ping.bind(this);
const pingCronJob = new CronJob(cronExpression, this.ping, () => {});
pingCronJob.start();
console.log(`Pinger created: ${name}`);
}
async ping() {
console.log(`Pinging ${this.name}`);
try {
await this.pingFunction();
console.log(`Pinging ${this.name} succeeded`);
} catch (err) {
console.log(`Pinging ${this.name} failed: ${err}`);
try {
if (this.onPingFailed) {
await this.onPingFailed(err);
}
} catch (e) {
console.log('error in onpingfailed', e);
}
console.log('Finished pinging');
}
}
}
PromiseUtils.js
export const timeoutPromise = (ms, promise) => {
// Create a promise that rejects in <ms> milliseconds
let timeout = new Promise((resolve, reject) => {
setTimeout(() => {
reject('Timed out in ' + ms + 'ms.');
}, ms);
});
// Returns a race between our timeout and the passed in promise
return Promise.race([promise, timeout]);
};
@hwesley what are you trying to protect against and trying to do? Why did you determine you need to close the pool instead of just letting the pool establish new connections? Do you have a standby database? Do you have RAC? Even if not, can you create a FAN service (for the DB / Oracle Net) and enable
events
in node-oracledb? What sqlnet.ora parameters do you have in your Node servers - see Connections and High Availability ? Do you have SQLNET.EXPIRE_TIME in the DB sqlnet.ora?Are you terminating the pool with a specified
drainTime
?If you're using the await/async style of programming, triple-check you have used 'await' appropriately - not doing this can commonly lead to crashes.
Hi @cjbj , We are running a backend on Azure, which connects to an Oracle DB which we do not necessarily control, so I can't change settings on that environment.
We had an issue were Azure drops idle connections (after 4 minutes) because of their NAS infrastructure. We noticed that when this happens, the pool doesn't notice that the connection is no longer valid & it hangs on pool.getConnection(). To avoid this, we started sending pings to keep the connection alive, which solved our issue when the database is available (the poolPingInterval
didn't seem to do anything).
Now we want to fix another issue. It can happen dat the Oracle DB goes down & becomes unavailable. When this happens, the pool becomes idle again, and we thus notice the same issue that the pool doesn't notice this. When we startup the Oracle DB again after 4 minutes, the ping keeps failing while the Oracle DB is available. As a last resort we thought we should destroy the pool & create it again to fix this state (see onPingFailed callback).
I've double checked the await logic. When I don't call the recreatePools()
method, and throw an exception in onPingFailed callback or ping function, the exception gets handled like it should & the pinging (every 2 minutes) keeps happening.
I've temporarily setup an Oracle DB on Azure following this tutorial to debug this issue. It's currently an empty database because the only thing I need to get working is SELECT 1 FROM DUAL & have durable connections.
To reproduce the memory corruption bug, the Oracle DB doesn't even need to be available. It doesn't always show the memory corruption stacktrace. Sometimes it just keeps hanging after the creating the pool and the ping doesn't get triggered anymore.
I'm currently not using drainTime. I tried it with 0 but the result was the same. Because the Oracle DB is not reachable, there are no active connections & this parameter should not matter.
@hwesley details matter! Error numbers and the like will help.
I would like to wind back to the "the pool doesn't notice that the connection is no longer valid & it hangs on pool.getConnection()." and dig more. Do you have a pstack? Can you enable Oracle client & net tracing?
What sqlnet.ora configuration do you have?
I'd be happy to take this offline (send me email) if that helps.
@hwesley My favorite setting for resolving net un-responsiveness is to add DISABLE_OOB=ON
to the sqlnet.ora (on the Node.js computer). You may also want to look at tuning TCP/IP timeouts.
Breaking news (at least to me) is that EXPIRE_TIME is usable in "client" (aka Node.js computer) sqlnet.ora files as of at least 18c client for the C library stack, as used by node-oracledb. This is for 'dead server' detection. When used in a DB server sqlnet.ora it performs 'dead client' detection. I will get the Oracle Net doc updated.
Anyway, along with this, and other options mentioned in the previous node-oracledb doc link, you have some options to experiment with. They may or may not help your current situation but should give you a better high availability experience overall.
Hi @cjbj, Thanks for your input! I currently don't have a sqlnet.ora configuration file, so it looks like there are some settings I didn't try yet, so i'll experiment and try to avoid recreating the pool!
I experimented with the TCP/IP keep alive flags a few months ago:
net.ipv4.tcp_keepalive_intvl: '20'
net.ipv4.tcp_keepalive_probes: '5'
net.ipv4.tcp_keepalive_time: '60'
But I remember it didn't work, they were not being sent. I suppose I need to have a sqlnet.ora file with EXPIRE_TIME to have this working? If I understand correctly the EXPIRE_TIME will override the tcp keep alive settings?
I understand that EXPIRE_TIME does supersede using ENABLE=BROKEN in the connect string and fiddling the machine TCP timeouts (https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-7A18022A-E40D-4880-B3CE-7EE9864756CA)
@hwesley The code could use some work...
Let's start with the constructor function for the DBConnector class. What is the point of lines like this?
this.createPool = this.createPool.bind(this);
I get the impression that you're trying to expose the methods as public or set the context of this
, neither of which is necessary. Here's an example:
function pause(ms) {
return new Promise(function(resolve) {
setTimeout(() => {
resolve();
}, ms);
});
}
class DBConnector {
constructor() {
this.prop = 'val';
}
async createPool(poolAlias) {
console.log('before');
await pause(3000);
// Methods have access to the instance
console.log('after', this.prop);
}
}
async function runTest() {
const myConnector = new DBConnector();
// createPool is public
await myConnector.createPool();
console.log('after after');
}
runTest();
Next, looking at createPool
, why are you returning a promise if it's not needed?
You're also mixing async APIs. For example, here you're awaiting a call to createPool
, but you pass in a callback which means it doesn't return a promise so there's nothing to be awaited.
await oracledb.createPool(config, (error, pool) => {...`
createPool
should look more like this:
async createPool(poolAlias) {
let config;
switch (poolAlias) {
case POOL:
config = { ...POOL_CONFIG, poolAlias, poolMax: 60 };
break;
case POOL_EDIT:
config = { ...POOL_CONFIG, poolAlias, poolMax: 4 };
break;
}
if (!config) {
throw new Error(`No config available for poolAlias: ${poolAlias}`);
}
try {
await oracledb.createPool(config);
console.log(`Finished creating pool: ${poolAlias}`);
} catch (err) {
console.log(`Create pool error: ${poolAlias}`, error);
throw err;
}
}
closePool
has issues similar to createPool
- unnecessary promise and mixing async APIs.
Favor pool.close
over pool.terminate
, as the latter is just an alias from earlier versions of the driver.
Next, add a drainTime
to the pool close call.
recreatePools
looks good.
recreatePool
tries to close a pool and if it fails, it tries to recreate the pool anyway. That will fail because of duplicate pool aliases in the pool cache. Perhaps it's missing a return
at the end of the catch
?
One of the bigger potential issues I see is the use of timeoutPromise
in several functions. This is a rather generic function, meaning it doesn't know about how the driver works. When you use this with getConnection
, it's possible that the timeoutPromise
finishes first, however, there's still a request for a connection in the queue. When that completes, the connection will be removed from the pool but not returned.
Did you know there's a queueTimeout
built in? https://github.com/oracle/node-oracledb/blob/dev-4.0/doc/api.md#propdbqueuetimeout
Using timeoutPromise
with execute
has a similar problem. A connection can only do one thing at a time. So if the timeoutPromise
happens first, you'll call closeConnection
but that call will have to wait until the execute
finishes. At that point, there's a connection doing work for nothing when it could be doing something more useful.
Have a look at break
for a better solution: https://github.com/oracle/node-oracledb/blob/dev-4.0/doc/api.md#break
I recommend stripping out timeoutPromise
.
In ping
, did you consider using the built-in ping
method? https://github.com/oracle/node-oracledb/blob/dev-4.0/doc/api.md#connectionping If it works for you, it should be more performant.
What happens if a call to executeQuery
is issued while you're recreating a pool? You'd have to build in a queue to prevent those calls from getting errors. Just something to be aware of...
Finally, I'd recommend trying to rewrite the code in a way that objects with success
properties are not needed. For example, excuteQuery
should re-throw the error and its caller should have a try/catch block when using it.
Hi Dan,
Thanks for your input! You're right, some parts of the code are not implemented very well. I've followed your remarks and updated the code accordingly.
I would also like to get rid of timeoutPromise
. The reason we have this, is because on an Azure VM, connections are being closed when they are idle (idle meaning no traffic) for 4 minutes. This is a parameter that can't be changed.
When this occurs, the connection from the pool doesn't seem to know it got closed. The result is that it hangs on getConnection
. We never receive a result, not even after 10 minutes. That's why we added the timeoutPromise. Ideally, this connection should stay alive. That's why we implemented a manual ping method in the first place, but this only runs on the connection given by getConnection
and not on every connection in the pool, so this approach still has flaws.
I've also set the queueTimeout
on 15000ms. This doesn't seem to have any effect.
The break
method would not help because I can't get a 'connection' object. Also it looks like this is new in version 4.*? We are still using the latest stable version: 3.1.2.
When pool connections fail when the database is not available, and stay valid when the database is available, there would be no need to recreate the pools.
To fix this, I believe we should be able to send TCP keep alive packets over the active connections. I've understood I need to have a sqlnet.ora configuration file with settings:
DISABLE_OOB=ON
SQLNET.EXPIRE_TIME=1
So far this didn't solve the issue. I've got some more suggestions from @cjbj, which I will follow up tomorrow.
The result is that it hangs on getConnection.
This sounds suspect. When connections are killed, the typical issue is that getConnection
returns bad connections, not that it hangs.
Hanging on getConnection
should only happen if there are no connections to get (they are all checked out). Have you checked your pool stats to verify this isn't the case? Perhaps you're leaking connections?
When you think about it if getConnection
hanged because of a bad connection, what good would timing out and repeating the process do?
I've also set the queueTimeout on 15000ms. This doesn't seem to have any effect.
Try adding a call to pool._logStats()
before you call getConnection
(you'll need to enable pool stats for this to work). Let us know what you get there when there's a hang.
The break method would not help because I can't get a 'connection' object.
I'm suggesting the use of break
where you've wrapped conn.execute
, not pool.getConnection
.
Also it looks like this is new in version 4.*? We are still using the latest stable version: 3.1.2.
No, break
has been there since version 1.
I would also like to get rid of timeoutPromise. The reason we have this, is because on an Azure VM, connections are being closed when they are idle (idle meaning no traffic) for 4 minutes.
I still think you need to go back to a basics and start with a simple program. Then let us help you solve problems as you encounter them
I still think you need to go back to a basics and start with a simple program. Then let us help you solve problems as you encounter them
Yeah, this sounds like the best way forward...
I tested again on Azure VM with _logStats()
. Here are the logs:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.004608519Z Pinging Oracle DB
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.005739616Z
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.005768416Z Pool statistics:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006595313Z ...total up time (milliseconds): 270348
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006619213Z ...total connection requests: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006623713Z ...total requests enqueued: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006626913Z ...total requests dequeued: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006629913Z ...total requests failed: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006633113Z ...total request timeouts: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006636113Z ...max queue length: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006639213Z ...sum of time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006642313Z ...min time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006645513Z ...max time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006648513Z ...avg time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006651613Z ...pool connections in use: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006654613Z ...pool connections open: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006657713Z Related pool attributes:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006660813Z ...poolAlias: digitalTwinPool
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006663813Z ...queueTimeout (milliseconds): 15000
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006666913Z ...poolMin: 0
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006670113Z ...poolMax: 60
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006673013Z ...poolIncrement: 1
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006676113Z ...poolTimeout (seconds): 30
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006679113Z ...poolPingInterval: 60
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006682213Z ...sessionCallback: undefined
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006685213Z ...stmtCacheSize: 30
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006689713Z Pool status:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006693913Z ...status: 6000
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006697013Z Related environment variables:
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.006700113Z ...process.env.UV_THREADPOOL_SIZE: 64
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.048230778Z Connection received, executing query...
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.053066463Z Closed connection
backend_1_bcac55c836e4 | 2019-07-02T08:25:00.053199162Z Pinging Oracle DB succeeded
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.001435650Z Pinging Oracle DB
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.001534350Z
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.001541749Z Pool statistics:
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003132545Z ...total up time (milliseconds): 570344
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003148044Z ...total connection requests: 1
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003152144Z ...total requests enqueued: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003155244Z ...total requests dequeued: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003158244Z ...total requests failed: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003161444Z ...total request timeouts: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003164444Z ...max queue length: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003169544Z ...sum of time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003173844Z ...min time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003176944Z ...max time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003180044Z ...avg time in queue (milliseconds): 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003183044Z ...pool connections in use: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003186144Z ...pool connections open: 1
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003189244Z Related pool attributes:
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003192744Z ...poolAlias: digitalTwinPool
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003195844Z ...queueTimeout (milliseconds): 15000
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003198944Z ...poolMin: 0
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003202044Z ...poolMax: 60
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003205044Z ...poolIncrement: 1
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003208044Z ...poolTimeout (seconds): 30
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003211144Z ...poolPingInterval: 60
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003214244Z ...sessionCallback: undefined
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003217244Z ...stmtCacheSize: 30
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003220244Z Pool status:
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003223344Z ...status: 6000
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003226344Z Related environment variables:
backend_1_bcac55c836e4 | 2019-07-02T08:30:00.003229444Z ...process.env.UV_THREADPOOL_SIZE: 64
I've also noticed it doesn't hang forever. After 15 minutes the getConnection()
seems to return a connection and the query succeeds. Obviously this 15 minutes is far too long.
I'm trying to get the webappawait.js example running. I've only had a few hours today so I didn't get too far. I'm mostly struggling with getting my empty OracleDB populated with the HR data. Will continue on this tomorrow. Is there any sample DB available so I don't need to set it up myself?
It looks like the output from those _logStats
calls is from pinging the DB. Did you experience a wait? The goal is to capture the stats when you have a wait.
I've also noticed it doesn't hang forever. After 15 minutes the getConnection() seems to return a connection and the query succeeds. Obviously this 15 minutes is far too long.
Are you sure the time is spent in getConnection
and not execute
?
[edited: ignore this deleted bit!]
Oracle's sample schema files are at https://github.com/oracle/db-sample-schemas
It looks like the output from those
_logStats
calls is from pinging the DB. Did you experience a wait? The goal is to capture the stats when you have a wait.I've also noticed it doesn't hang forever. After 15 minutes the getConnection() seems to return a connection and the query succeeds. Obviously this 15 minutes is far too long.
Are you sure the time is spent in
getConnection
and notexecute
?
Yes, it's a manual ping every 5 minutes (SELECT 1 FROM DUAL). The first time it succeeds (when connection gets created), the second time it hangs on getConnection for 15 minutes. The log Connection received, executing query...
is printed when the getConnection
method returns a connection
Alright, I tested with the webappawait.js
example. It's running on an Azure VM and connecting to an Oracle DB I've setup, also on an Azure VM.
The issue is exactly the same. The first time I open it in browser, the query succeeds. When I wait 5 minutes, the browser seems to timeout, i'm not sure if the getConnection
method actually ever times out.
I've tried changing both poolPingInterval
and poolTimeout
to 5 seconds, but this doesn't change anything.
I've also noticed that when I stop the application after this happens, it hangs on closing the pool. After 8-9 minutes it printed this:
Pool closed
Segmentation fault (core dumped)
Start by adding some debugging statements to webappawait.js to trace what is going on at the app level.
What is your node-oracledb side sqlnet.ora like? And where is it - e.g. how do you know it is being used?
How do you stop it?
Start by adding some debugging statements to webappawait.js to trace what is going on at the app level.
I see the same problems like in my own project. I've added some logs of webappawait.js below
user@localhost:~/oracle-test$ node webappawait.js
Wed Jul 03 2019 09:19:37 GMT+0000 (UTC): Creating pool
Wed Jul 03 2019 09:19:37 GMT+0000 (UTC): Pool created
Server running at http://localhost:7000
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): Request received, getting connection
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): connection received, executing query
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): query executed, showing results
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): Closing connection
Wed Jul 03 2019 09:19:52 GMT+0000 (UTC): Connection closed
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): Request received, getting connection
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): connection received, executing query
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): query executed, showing results
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): Closing connection
Wed Jul 03 2019 09:20:56 GMT+0000 (UTC): Connection closed
Wed Jul 03 2019 09:26:10 GMT+0000 (UTC): Request received, getting connection
---- AFTER 15 minutes I decided to stop the application with CTRL + C
^C
Terminating
Pool closed
Segmentation fault (core dumped)
user@localhost:~/oracle-test$
What is your node-oracledb side sqlnet.ora like? And where is it?
I tested both with & without this file. The client is located at /opt/oracle/instantclient_18_3
. The sqlnet.ora file is in /opt/oracle/instantclient_18_3/network/admin
. It currently contains:
DISABLE_OOB=ON
SQLNET.EXPIRE_TIME=1
I also have the environment variable: LD_LIBRARY_PATH=/opt/oracle/instantclient_18_3/:
how do you know it is being used?
I have no idea, I know the client is being used because when I put it in a wrong place, it gives errors. Can you tell me how to detect this?
Can you also set the environment variable DPI_DEBUG_LEVEL
to the value 12 and rerun your tests? That will tell us the underlying ODPI-C calls that are being made and any errors that are reported. Thanks!
Here are the logs with DPI_DEBUG_LEVEL=12
. You can see it hangs at 07:12:56.673.
Also the pool closing when I stop the application takes a very long time
user@dig-ocb-prod:~/oracle-test$ node webappawait.js
ODPI [02958] 2019-07-04 07:06:55.156: ODPI-C 3.1.2
ODPI [02958] 2019-07-04 07:06:55.157: debugging messages initialized at level 12
Thu Jul 04 2019 07:06:55 GMT+0000 (UTC): Creating pool
ODPI [02958] 2019-07-04 07:06:55.162: fn start dpiContext_create
ODPI [02958] 2019-07-04 07:06:55.176: fn end dpiContext_create -> 0
ODPI [02964] 2019-07-04 07:06:55.176: fn start dpiContext_initPoolCreateParams(0x55d24731f7d0)
ODPI [02964] 2019-07-04 07:06:55.176: fn end dpiContext_initPoolCreateParams(0x55d24731f7d0) -> 0
ODPI [02964] 2019-07-04 07:06:55.176: fn start dpiContext_initCommonCreateParams(0x55d24731f7d0)
ODPI [02964] 2019-07-04 07:06:55.176: fn end dpiContext_initCommonCreateParams(0x55d24731f7d0) -> 0
ODPI [02964] 2019-07-04 07:06:55.177: fn start dpiPool_create(0x55d24731f7d0)
ODPI [02964] 2019-07-04 07:06:55.183: fn end dpiPool_create(0x55d24731f7d0) -> 0
ODPI [02964] 2019-07-04 07:06:55.183: fn start dpiPool_setTimeout(0x7f16a400c920)
ODPI [02964] 2019-07-04 07:06:55.183: fn end dpiPool_setTimeout(0x7f16a400c920) -> 0
ODPI [02964] 2019-07-04 07:06:55.183: fn start dpiPool_setStmtCacheSize(0x7f16a400c920)
ODPI [02964] 2019-07-04 07:06:55.183: fn end dpiPool_setStmtCacheSize(0x7f16a400c920) -> 0
Thu Jul 04 2019 07:06:55 GMT+0000 (UTC): Pool created
Server running at http://localhost:7000
Thu Jul 04 2019 07:07:11 GMT+0000 (UTC): Request received, getting connection
ODPI [02958] 2019-07-04 07:07:11.669: fn start dpiPool_addRef(0x7f16a400c920)
ODPI [02958] 2019-07-04 07:07:11.669: fn end dpiPool_addRef(0x7f16a400c920) -> 0
ODPI [02965] 2019-07-04 07:07:11.669: fn start dpiContext_initConnCreateParams(0x55d24731f7d0)
ODPI [02965] 2019-07-04 07:07:11.669: fn end dpiContext_initConnCreateParams(0x55d24731f7d0) -> 0
ODPI [02965] 2019-07-04 07:07:11.669: fn start dpiPool_acquireConnection(0x7f16a400c920)
ODPI [02965] 2019-07-04 07:07:12.690: fn end dpiPool_acquireConnection(0x7f16a400c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.690: fn start dpiPool_release(0x7f16a400c920)
ODPI [02958] 2019-07-04 07:07:12.690: fn end dpiPool_release(0x7f16a400c920) -> 0
Thu Jul 04 2019 07:07:12 GMT+0000 (UTC): connection received, executing query
ODPI [02958] 2019-07-04 07:07:12.691: fn start dpiConn_addRef(0x7f169c00c920)
ODPI [02958] 2019-07-04 07:07:12.691: fn end dpiConn_addRef(0x7f169c00c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.691: fn start dpiConn_newVar(0x7f169c00c920)
ODPI [02958] 2019-07-04 07:07:12.691: fn end dpiConn_newVar(0x7f169c00c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.691: fn start dpiVar_setFromBytes(0x55d2473563d0)
ODPI [02958] 2019-07-04 07:07:12.691: fn end dpiVar_setFromBytes(0x55d2473563d0) -> 0
ODPI [02966] 2019-07-04 07:07:12.691: fn start dpiConn_prepareStmt(0x7f169c00c920)
ODPI [02966] 2019-07-04 07:07:12.692: fn end dpiConn_prepareStmt(0x7f169c00c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.692: fn start dpiStmt_getInfo(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.692: fn end dpiStmt_getInfo(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.692: fn start dpiStmt_bindByPos(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.692: fn end dpiStmt_bindByPos(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.692: fn start dpiStmt_execute(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.815: fn end dpiStmt_execute(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.815: fn start dpiStmt_getQueryInfo(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.815: fn end dpiStmt_getQueryInfo(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.815: fn start dpiStmt_getQueryInfo(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.815: fn end dpiStmt_getQueryInfo(0x7f16a000c920) -> 0
ODPI [02966] 2019-07-04 07:07:12.815: fn start dpiStmt_getQueryInfo(0x7f16a000c920)
ODPI [02966] 2019-07-04 07:07:12.815: fn end dpiStmt_getQueryInfo(0x7f16a000c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.816: fn start dpiConn_release(0x7f169c00c920)
ODPI [02958] 2019-07-04 07:07:12.816: fn end dpiConn_release(0x7f169c00c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.816: fn start dpiVar_release(0x55d2473563d0)
ODPI [02958] 2019-07-04 07:07:12.816: fn end dpiVar_release(0x55d2473563d0) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiConn_newVar(0x7f169c00c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiConn_newVar(0x7f169c00c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiStmt_define(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiStmt_define(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiConn_newVar(0x7f169c00c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiConn_newVar(0x7f169c00c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiStmt_define(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiStmt_define(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiConn_newVar(0x7f169c00c920)
ODPI [02967] 2019-07-04 07:07:12.816: fn end dpiConn_newVar(0x7f169c00c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.816: fn start dpiStmt_define(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.817: fn end dpiStmt_define(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.817: fn start dpiStmt_setFetchArraySize(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.817: fn end dpiStmt_setFetchArraySize(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.817: fn start dpiStmt_fetchRows(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.940: fn end dpiStmt_fetchRows(0x7f16a000c920) -> 0
ODPI [02967] 2019-07-04 07:07:12.940: fn start dpiStmt_release(0x7f16a000c920)
ODPI [02967] 2019-07-04 07:07:12.940: fn end dpiStmt_release(0x7f16a000c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.940: fn start dpiVar_release(0x7f16940435e0)
ODPI [02958] 2019-07-04 07:07:12.940: fn end dpiVar_release(0x7f16940435e0) -> 0
ODPI [02958] 2019-07-04 07:07:12.940: fn start dpiVar_release(0x7f1694041ce0)
ODPI [02958] 2019-07-04 07:07:12.940: fn end dpiVar_release(0x7f1694041ce0) -> 0
ODPI [02958] 2019-07-04 07:07:12.940: fn start dpiVar_release(0x7f169400c920)
ODPI [02958] 2019-07-04 07:07:12.940: fn end dpiVar_release(0x7f169400c920) -> 0
Thu Jul 04 2019 07:07:12 GMT+0000 (UTC): query executed, showing results
Thu Jul 04 2019 07:07:12 GMT+0000 (UTC): Closing connection
ODPI [02964] 2019-07-04 07:07:12.942: fn start dpiConn_close(0x7f169c00c920)
ODPI [02964] 2019-07-04 07:07:12.942: fn end dpiConn_close(0x7f169c00c920) -> 0
ODPI [02958] 2019-07-04 07:07:12.942: fn start dpiConn_release(0x7f169c00c920)
ODPI [02958] 2019-07-04 07:07:12.942: fn end dpiConn_release(0x7f169c00c920) -> 0
Thu Jul 04 2019 07:07:12 GMT+0000 (UTC): Connection closed
Thu Jul 04 2019 07:12:56 GMT+0000 (UTC): Request received, getting connection
ODPI [02958] 2019-07-04 07:12:56.673: fn start dpiPool_addRef(0x7f16a400c920)
ODPI [02958] 2019-07-04 07:12:56.673: fn end dpiPool_addRef(0x7f16a400c920) -> 0
ODPI [02965] 2019-07-04 07:12:56.673: fn start dpiContext_initConnCreateParams(0x55d24731f7d0)
ODPI [02965] 2019-07-04 07:12:56.673: fn end dpiContext_initConnCreateParams(0x55d24731f7d0) -> 0
**ODPI [02965] 2019-07-04 07:12:56.673: fn start dpiPool_acquireConnection(0x7f16a400c920)
ODPI [02965] 2019-07-04 07:29:58.920: fn end dpiPool_acquireConnection(0x7f16a400c920) -> 0**
ODPI [02958] 2019-07-04 07:29:58.920: fn start dpiPool_release(0x7f16a400c920)
ODPI [02958] 2019-07-04 07:29:58.920: fn end dpiPool_release(0x7f16a400c920) -> 0
Thu Jul 04 2019 07:29:58 GMT+0000 (UTC): connection received, executing query
ODPI [02958] 2019-07-04 07:29:58.920: fn start dpiConn_addRef(0x7f169c1b3b90)
ODPI [02958] 2019-07-04 07:29:58.920: fn end dpiConn_addRef(0x7f169c1b3b90) -> 0
ODPI [02958] 2019-07-04 07:29:58.920: fn start dpiConn_newVar(0x7f169c1b3b90)
ODPI [02958] 2019-07-04 07:29:58.920: fn end dpiConn_newVar(0x7f169c1b3b90) -> 0
ODPI [02958] 2019-07-04 07:29:58.920: fn start dpiVar_setFromBytes(0x55d247352620)
ODPI [02958] 2019-07-04 07:29:58.920: fn end dpiVar_setFromBytes(0x55d247352620) -> 0
ODPI [02966] 2019-07-04 07:29:58.920: fn start dpiConn_prepareStmt(0x7f169c1b3b90)
ODPI [02966] 2019-07-04 07:29:58.920: fn end dpiConn_prepareStmt(0x7f169c1b3b90) -> 0
ODPI [02966] 2019-07-04 07:29:58.920: fn start dpiStmt_getInfo(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:58.920: fn end dpiStmt_getInfo(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:58.920: fn start dpiStmt_bindByPos(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:58.920: fn end dpiStmt_bindByPos(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:58.920: fn start dpiStmt_execute(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:59.049: fn end dpiStmt_execute(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:59.049: fn start dpiStmt_getQueryInfo(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:59.049: fn end dpiStmt_getQueryInfo(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:59.049: fn start dpiStmt_getQueryInfo(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:59.049: fn end dpiStmt_getQueryInfo(0x7f16a0040da0) -> 0
ODPI [02966] 2019-07-04 07:29:59.049: fn start dpiStmt_getQueryInfo(0x7f16a0040da0)
ODPI [02966] 2019-07-04 07:29:59.049: fn end dpiStmt_getQueryInfo(0x7f16a0040da0) -> 0
ODPI [02958] 2019-07-04 07:29:59.050: fn start dpiConn_release(0x7f169c1b3b90)
ODPI [02958] 2019-07-04 07:29:59.050: fn end dpiConn_release(0x7f169c1b3b90) -> 0
ODPI [02958] 2019-07-04 07:29:59.050: fn start dpiVar_release(0x55d247352620)
ODPI [02958] 2019-07-04 07:29:59.050: fn end dpiVar_release(0x55d247352620) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiConn_newVar(0x7f169c1b3b90)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiConn_newVar(0x7f169c1b3b90) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_define(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiStmt_define(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiConn_newVar(0x7f169c1b3b90)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiConn_newVar(0x7f169c1b3b90) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_define(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiStmt_define(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiConn_newVar(0x7f169c1b3b90)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiConn_newVar(0x7f169c1b3b90) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_define(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiStmt_define(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_setFetchArraySize(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.050: fn end dpiStmt_setFetchArraySize(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.050: fn start dpiStmt_fetchRows(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.179: fn end dpiStmt_fetchRows(0x7f16a0040da0) -> 0
ODPI [02967] 2019-07-04 07:29:59.179: fn start dpiStmt_release(0x7f16a0040da0)
ODPI [02967] 2019-07-04 07:29:59.179: fn end dpiStmt_release(0x7f16a0040da0) -> 0
ODPI [02958] 2019-07-04 07:29:59.179: fn start dpiVar_release(0x7f1694043f50)
ODPI [02958] 2019-07-04 07:29:59.179: fn end dpiVar_release(0x7f1694043f50) -> 0
ODPI [02958] 2019-07-04 07:29:59.179: fn start dpiVar_release(0x7f1694042000)
ODPI [02958] 2019-07-04 07:29:59.179: fn end dpiVar_release(0x7f1694042000) -> 0
ODPI [02958] 2019-07-04 07:29:59.179: fn start dpiVar_release(0x7f1694041da0)
ODPI [02958] 2019-07-04 07:29:59.179: fn end dpiVar_release(0x7f1694041da0) -> 0
Thu Jul 04 2019 07:29:59 GMT+0000 (UTC): query executed, showing results
Thu Jul 04 2019 07:29:59 GMT+0000 (UTC): Closing connection
ODPI [02964] 2019-07-04 07:29:59.180: fn start dpiConn_close(0x7f169c1b3b90)
ODPI [02964] 2019-07-04 07:29:59.180: fn end dpiConn_close(0x7f169c1b3b90) -> 0
ODPI [02958] 2019-07-04 07:29:59.180: fn start dpiConn_release(0x7f169c1b3b90)
ODPI [02958] 2019-07-04 07:29:59.180: fn end dpiConn_release(0x7f169c1b3b90) -> 0
Thu Jul 04 2019 07:29:59 GMT+0000 (UTC): Connection closed
^C
Terminating
ODPI [02965] 2019-07-04 07:43:55.127: fn start dpiPool_close(0x7f16a400c920)
ODPI [02965] 2019-07-04 08:00:59.373: fn end dpiPool_close(0x7f16a400c920) -> 0
ODPI [02958] 2019-07-04 08:00:59.373: fn start dpiPool_release(0x7f16a400c920)
ODPI [02958] 2019-07-04 08:00:59.373: fn end dpiPool_release(0x7f16a400c920) -> 0
Pool closed
I'm pretty sure enabling TCP keep alive packets can fix this problem. But none of the options in the library seem to enable them. Maybe we should continue the focus on that?
I agree the evidence looks like some network issue. Contact the Azure folk and see what they say about their part of the stack.
One thing to look at is an Oracle Net trace. In sqlnet.ora
add something like:
ADR_BASE=/tmp/mylogs
DIAG_ADR_ENABLED=ON
TRACE_LEVEL_CLIENT=16
Then make the log directory: mkdir /tmp/mylogs
.
Then run the app.
Look for what's happening at the time of the hang. We don't need full logs. And use a gist for output!
There's no crash with webappasync.js
?
Azure closes outbound idle connections on VM's after 4 minutes. This setting can not be changed. We've had contact with Azure before concerning this issue. One of the things we noticed is that they don't send a TCP RST package when they close a connection. Unfortunately this is done so by design and can not be changed.
Because of that, I think we can only fix this by keeping the connection alive (which is our goal), so there should be a way of sending TCP keep alive packets over all open connections in the pool. As far as I understand this is possible with the node-oracle library, but I can't seem to get this working.
I've changed the sqlnet.ora
to above configuration. You can find the results here: https://we.tl/t-ZHEeyCVAiN
I don't like it, but here's something I just mocked up that might help (would require a fixed pool size).
setInterval(async () => {
const pool = oracledb.getPool();
const conns = [];
// First take all of the connections out of the pool.
for (let idx = 0; idx < pool.connectionsOpen; idx += 1) {
const conn = await pool.getConnection();
conns.push(conn);
}
// Then ping and close each connection.
for (let idx = 0; idx < conns.length; idx += 1) {
try {
await conns[idx].ping();
} catch (err) {
console.error(err);
} finally {
try {
await conns[idx].close();
} catch (err) {
console.log(err);
}
}
}
}, 120000);
That would be a possibility indeed, but isn't that something that the poolPingInterval attribute is supposed to do?
And what about the previously mentioned EXPIRE_TIME, shouldn't that send a keep alive packet every X minutes?
When a socket is opened with SO_KEEPALIVE, the configured tcp keep alive settings would be used and TCP keep alive packets get sent automatically at the specified interval. Is there a settings in the oracle client that enables this behaviour?
but isn't that something that the poolPingInterval attribute is supposed to do?
As far as I know, poolPingInterval
works when you invoke getConnection
. It's not working in the background.
And what about the previously mentioned EXPIRE_TIME
This seems like the best option, though now I'm wondering if it suffers from the same issue as poolPingInterval
. Have you tries setting using this on the database server instead of the client?
Is there a settings in the oracle client that enables this behaviour?
This white paper indicates that dead connection detection in 12c+ uses the TCP layer (perhaps SO_KEEPALIVE is part of that solution): https://www.oracle.com/technetwork/database/enterprise-edition/oraclenetdcd-2179641.pdf
What version of the database are you connecting to?
@hwesley it seems that "Oracle" is in discussion with "Microsoft" about Azure timeouts.
Setting keepalive has been found effective in breaking the 4-minute barrier for apps in Azure VM connecting to an Oracle Cloud ATP-S DB.
One thing I got corrected on today was that on clients, EXPIRE_TIME currently needs to be set in the connect descriptor, not sqlnet.ora (this may change in the future). This is true from 18c on. But the easiest way to use it is to update to 19c client libraries and utilize the new 'Easy Connect Plus' syntax:
connectionString: `${process.env.DB_HOST}:${process.env.DB_PORT}/cdb1?expire_time=1`
Although not documented (!), you can use expire_time like this. Give it a shot and let us know.
If you can't upgrade from 18c, create a tnsnames.ora entry with (expire_time=1)
in the DESCRIPTION section.
You may also be able to change the outbound reset time in Azure, see https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-rules-overview Apparently the limit is not the documented '66' but actually 120. This hasn't been tested, to my knowledge.
@hwesley any news on this?
Hi @cjbj ,
Thanks for the input. I haven't been able to continue on this. I'll try to pick it up next week!
Kind regards
hi @cjbj ,
Good news! I updated the oracle sample application to version 19.3 of the client libraries. I added the expire_time
to the connection string and the parameter works! I don't see any timeouts.
We haven't tried the suggestions with a load balancer, as we are also in the process of moving to k8s. I'm not sure if we are going to have the same problems there, but now with the expire_time we can be sure the connections are kept alive!
@hwesley thanks for the feedback. I'm glad you have a solution. I'll close this now
Hi,
I am creating 2 connection pools, with different poolAlias & poolMax variables (one for edits, one for select queries), but otherwise they have the below configuration:
I'm periodically pinging the database with a simple
SELECT 1 FROM DUAL
-query. When the ping fails, I want to recreate the 2 pools by callingclose
followed bycreatePool
. The ping query times out after 15 seconds.I am using version 3.1.2 of the node-oracledb library. I am also using the following client: instantclient-basic-linux.x64-18.3.0.0.0dbru.
When the database is available, this works as expected & the pings are successful.
However, when the database is not available (eg server is offline), my ping fails as expected. During the recreate, I get the following stacktrace after closing the pool. It's not possible to catch it with a try-catch, my methods complete & return as expected (I wait on the callbacks to check if it succeeded). It somehow prints the stacktrace from outside my code:
It looks like an issue in the client, but I don't know how to avoid it. It also seems to block further execution of my periodic pings.