ibm_db throws unhandled exception when recovering after a transaction

asselin commented 11 months ago

System: Linux x86_64 (Bullseye), node v18.16.1

Summary

In our environment, DB2 will close idle connections after 45 minutes. Our client code uses pooling for connections. If we send a normal query after a connection has been closed, checkConnectionError works fine to close and recover the connections.

However, if the first call after the timeout is a transaction, then checkConnectionError throws an error, which causes the cpp code to issue an uncaughtException event (which winds up terminating our process).

The calling code looks like this:

    const db = <connection returned by pool.open>

    try {
      await db.beginTransaction();
      const res = await db.query(...);
      await db.commitTransaction();
      return res;
    } catch (err) {
      await db.rollbackTransaction();
      throw err;
    } finally {
      await db.close();
    }

Here's the stack trace of what's happening (note the lines are from version 2.8.1, but I verified the code hasn't changed between 2.8.1 and 3.2.1):

2023-08-09T02:02:42.204Z [3] : [app] UNCAUGHT EXCEPTION: Error: Error: [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 Error: Error: [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003
at Database.endTransactionSync (/usr/local/lib/node_modules/ibm_db/lib/odbc.js:1192:11)
at Database.rollbackTransactionSync (/usr/local/lib/node_modules/ibm_db/lib/odbc.js:1208:15)
at Database.poolCloseSync (/usr/local/lib/node_modules/ibm_db/lib/odbc.js:2198:12)
at Database.checkConnectionError (/usr/local/lib/node_modules/ibm_db/lib/odbc.js:484:12)
at cbQuery (/usr/local/lib/node_modules/ibm_db/lib/odbc.js:532:16)
at callbackTrampoline (node:internal/async_hooks:130:17)

The gist of what's happening is that a SQL30081N error is detected when calling query, and checkConnectionError is called. checkConnectionError calls poolCloseSync to close the connection. poolCloseSync detects that the connection was in a transaction, so it calls rollbackTransactionSync, which calls endTransactionSync. endTransactionSync gets an error from the CPP code ([IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003), so it throws, which winds its way back through all the calls, back to the CPP code for UV_AfterQuery, which detects the exception and then calls FatalException to send the uncaughtException event.

One suggestion-- maybe poolCloseSync should wrap the call to rollbackTransactionSync in a try/catch and ignore any exceptions, since the connections are being closed anyway.

I also found a way to reproduce this locally:

Using a pool, issue a query to DB2.
Use lsof to find the file descriptor number being used for the TCP/IP connection to the server.
Close the file descriptor (fs.closeSync).
Again, using a pool, start a transaction and issue a query to DB2.

@bimalkjha Does that all make sense? Do you need any more data to help get this fixed?

Thanks!

bimalkjha commented 11 months ago

@asselin Yes, we need to call db.rollbackTransactionSync(); in a try/catch block. I am trying to reproduce it. Thanks.

asselin commented 10 months ago

@bimalkjha I made the following change in the library, and can confirm that the node process doesn't crash with an unhandled exception anymore.

Have you had any luck reproducing the issue?

// closeSync function for pooled connection.
function patchedPoolCloseSync() {
  var db = this;
  var self = this.pool;
  var connStr = this.connStr;
  if (!self) {
    return db.closeSync();
  }

  self.availablePool[connStr] = self.availablePool[connStr] || [];
  // If conn.closeSync is called more than once for this connection,
  // nothing to do. Just return.
  if (self.availablePool[connStr] && self.availablePool[connStr].indexOf(db) >= 0) {
    return;
  }
  db.lastUsed = Date.now();

  // If this connection has some active transaction, rollback the
  // transaction to free up the held resorces before moving back to
  // the pool. So that, next request can get afresh connection from pool.
  if (db.conn && db.conn.inTransaction) {
    /// BEGIN CHANGE
    try {
      db.rollbackTransactionSync();
    } catch (rollbackError) {
      // Just ignore the error, since we're closing the connection.
    }
    /// END CHANGE
  }

  // remove this db from the usedPool
  if (self.usedPool[connStr]) {
    self.usedPool[connStr].splice(self.usedPool[connStr].indexOf(db), 1);
    self.poolSize--;
  }

  // move this connection back to the connection pool at the end.
  if (db.conn) {
    self.availablePool[connStr].push(db);
    self.poolSize++;
  }

  if (db.conn && self.options.autoCleanIdle) self.cleanUp(connStr);
  return self.queue.next();
} // db.closeSync function

bimalkjha commented 10 months ago

@asselin Yes, I was able to reproduce the issue by updating existing test case. Updated code to fix the issue and verified fix using the new test program. Last commit has pushed the fix. Thanks.

asselin commented 10 months ago

Thank you @bimalkjha!

asselin commented 10 months ago

@bimalkjha When do you think this fix might be published to npm?

Thanks!

bimalkjha commented 10 months ago

@asselin New version of ibm_db will get released this month only before 30th September. Thanks.

ibmdb / node-ibm_db

ibm_db throws unhandled exception when recovering after a transaction #938

Summary