oracle / node-oracledb

Oracle Database driver for Node.js maintained by Oracle Corp.
http://oracle.github.io/node-oracledb/
Other
2.24k stars 1.07k forks source link

NJS-040: connection request timeout #1645

Open todevmilen opened 4 months ago

todevmilen commented 4 months ago

I receive this error NJS-040: connection request timeout while the server is running on production. Already check all closed issue regarding this error also read the documentation, but still facing a problem to solve it. Please assist me. Versions

Here is the configuration from which i make a pool. Also before I start the index.js file i set the UV_THREADPOOL_SIZE to 25 (because i have two databases with 10 poolMax set.

CMD ["sh", "-c", "export UV_THREADPOOL_SIZE=25 && node index.js"]
module.exports = {
hrPool: {
    user: process.env.USER_2,
    password: process.env.PASSWORD_2,
    connectString: "local",
    poolMin: 10,
    poolMax: 10,
    poolIncrement: 0,
  },
  hrPoolTest: {
    user: process.env.TEST_NAME,
    password: process.env.TEST_PASS,
    connectString: "test",
    poolMin: 10,
    poolMax: 10,
    poolIncrement: 0,
  },
}

I can see in the logs that just before this error occurs the incoming request doesn't close the connections. Here is example code of the way I create and close connections.

const oracledb = require("oracledb");

async function find(selectItemsParams) {
  let conn;

  try {
    conn = await oracledb.getConnection();
    oracledb.outFormat = oracledb.OBJECT;
    const result = await conn.execute(
      `BEGIN example.slc_items ( 
        :p_item_id,
        :p_item_code,
        :info_slc_items
        ); END;`,
      {
        p_item_id: selectItemsParams.p_item_id,
        p_item_code: selectItemsParams.p_item_code,
        info_slc_items: {
          dir: oracledb.BIND_OUT,
          type: oracledb.DB_TYPE_CURSOR,
        },
      },
      { resultSet: true }
    );
    const resultSet = result.outBinds.info_slc_items;

    let row;
    let rows = [];

    while ((row = await resultSet.getRow())) {
      rows.push(row);
    }

    await resultSet.close();
    return rows;
  } catch (err) {
    console.error(err);
    throw err;
  } finally {
    if (conn) {
      // conn assignment worked, need to close
      try {
        await conn.close();
        console.log("connection is closed;");
      } catch (err) {
        console.error(err);
      }
    }
  }
}
module.exports.find = find;

I also can see that the number of active handlers/requests is continuously increasing.

sudarshan12s commented 4 months ago

Hi @todevmilen , as you mentioned, the incoming requests are not getting closed might be the cause for pool treating all the connections as busy and hence cannot provide a new connection . You have more information on which phase/line number, they are stuck like in conn.execute, rs.getRow, ... I see you have 2 pools created inside application, the incoming requests for both the pools are returning NJS-040 ?

If any one of the queries are stuck, you can enable the callTimeout , check firewall settings if idle connections are not dropped out and hence delaying new connection creations , other timeout settings to limit the time..

todevmilen commented 4 months ago

@sudarshan12s , I'm not sure exactly on which phase they are stuck, but the requests are coming, no data is returned, the connections are not closing and then the error occurs. Normally these connections are closing without a problem.

I have two pools created, but the default one (first one created) is used a lot more frequent than the second one that is used very rare, so that's why i'm not sure if the problem occurs also for the second pool.

sudarshan12s commented 4 months ago

Can you try callTimeout setting(conn.callTimeout = t(in msec) ) before executing the query. It could be due to specific long running queries which are delayed.. Since you have older database version 11g , you may need to add DISABLE_OOB=ON to a sqlnet.ora file.

todevmilen commented 4 months ago

The problem is that i can't catch exactly on which query it happens, more also we have reports that need different time based on period and is hard to set a fixed timeout. Is there anything else that i can check to resolve this issue ?

sudarshan12s commented 4 months ago

ok understood, I mean its just a max time for any query, if it can be derived. enabling logs (export DPI_DEBUG_LEVEL=92 ) would help to see the API details like establishing a connection (dpiConn_create) , running a query (dpiStmt_execute) but since its on production, you might not be able to do it. We might otherwise need to check on server side to see what active sqls are run from active sessions (from username, machine ..) at that time..

todevmilen commented 4 months ago

From what I saw for the most of the time there is only one active session that switch from active to unactive and from time to time there is two max three active sessions.

sudarshan12s commented 4 months ago

From what I saw for the most of the time there is only one active session that switch from active to unactive and from time to time there is two max three active sessions.

I suspect few previous connections/sockets have not got terminated gracefully. timeout configurations should cleanup such connections..

This could be due to a firewall dropping packets for idle connections. You can enable TCP Keepalives by setting expireTime(https://node-oracledb.readthedocs.io/en/latest/api_manual/oracledb.html) A good value would be to set it to 1(1 minute).

todevmilen commented 4 months ago

I can't find where to place the expireTime. I'm using Oracle 11g and all the information i see is for greater version.

sudarshan12s commented 4 months ago

timeout

I think the 11g server does have support. Since you have 21.5 instant client, It can be placed in connect string as below or client sqlnet.ora ( SQLNET.EXPIRE_TIME=1 ).

Easy Connect: host/service?expire_time=n.
ex: "localhost:1521/testServ?expire_time=1

With this setting probe packets can be seen through tcpdump at regular intervals , ... Is this not working?

From doc:

Setting EXPIRE_TIME in C Clients:

todevmilen commented 4 months ago

I'm using tnsnames.ora file. Is that a correct place where EXPIRE_TIME=1 should stay?

XE_test =
  (DESCRIPTION =
    (EXPIRE_TIME=1)
    (ADDRESS = (PROTOCOL = TCP)(HOST = xxx.xxx.x.xxx )(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = XE.local)
    )
  )
sudarshan12s commented 4 months ago

I'm using tnsnames.ora file. Is that a correct place where EXPIRE_TIME=1 should stay?

XE_test =
  (DESCRIPTION =
    (EXPIRE_TIME=1)
    (ADDRESS = (PROTOCOL = TCP)(HOST = xxx.xxx.x.xxx )(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = XE.local)
    )
  )

Yes its correct

todevmilen commented 4 months ago

Ok, great. I will place it there to check if this will fix the problem.

todevmilen commented 4 months ago

I've made the changes, but today the error shows again :(.

sudarshan12s commented 4 months ago

Hi @todevmilen , You have more information on reproducing scenario . There are connect times which can be tuned for getting a connection, callTimeout for query executions ,.. If you can share more traces (DPI_LOG_LEVEL=92) , It would be easier to find the exact function call and debug further..