tediousjs / node-mssql

Microsoft SQL Server client for Node.js
https://tediousjs.github.io/node-mssql
MIT License
2.23k stars 467 forks source link

Handling database unavailability #1477

Closed marknelissen closed 5 months ago

marknelissen commented 1 year ago

I am running a continuous process with a single connection pool to a single database. When the database is shutdown (maintenance, technical issue, etc.), the connection pool becomes unusable, even after the database comes back online.

As long as the database is down, on my requests I get errors like "Error: SHUTDOWN is in progress". But after the database is back up, the error becomes "Error: Invalid object name 'TABLE_NAME'". It seems as if the pool is not capable of recycling the connections and replacing them with new ones in the case of a database service restart.

I didn't find the instructions in the documentation on how to handle this. Is this supposed to be handled by the pool, or should I manually rotate the pool with a new one? Where is the best place to detect this kind of connection errors to initiate replacement of the pool? Or is this somehow fixed in newer versions?

Expected behaviour:

Errors on the requests during database downtime, but requests start resolving again after the database comes back online.

Actual behaviour:

Requests return an error saying the table does not exist.

Configuration:

        const port = process.env.PORT ? { port: Number(process.env.PORT) } : {}

        const pool = new ConnectionPool({
            server: process.env.SERVER,
            database: process.env.DATABASE,
            ...port,
            options: {
                enableArithAbort: true,
                encrypt: false,
                fallbackToDefaultDb: true,
            },
            domain: process.env.DOMAIN,
            user: process.env.USERNAME,
            password: process.env.PASSWORD,
        })

Software versions

dhensby commented 1 year ago

First, v6.4.1 is very old and not supported. There have been significant pool improvements since that release. First step is to upgrade to v9.

Second, in v9 the pool should be able to recover from server unavailability. If a connection errors, it is discarded from the pool and then next time you request a connection from the pool one of two things can happen:

  1. If there are still connections in the pool, it will pull one out, run a health check on it (if the DB is down this will fail); repeat until healthy connection is found, if no connections are left, the next step is performed...
  2. A new connection is made, if it errors, the request for a connection from the pool will reject.

When the server comes back online, the pool should be depleted (all connections were bad and discarded), so a new connection will be made and returned.

The error your getting is odd... It indicates a badly constructed SQL query and not a connection issue, is that error being produced when trying to obtain a connection or when trying to run an SQL query?

marknelissen commented 1 year ago

Regarding the pool upgrade, I was asking first to see whether the effort is justified, since this is a script that will become obsolete in the coming year, and is very difficult to test (no local database to test it on), so I'd like to avoid breaking library changes, hence my reticence to upgrade to new major versions, which is why I was looking for some confirmation that this version does indeed contain better error/health handling.

Indeed, the error itself amazes me as well, however I can confirm the query is correctly structured, since before the database shutdown, the query executes without issue. The timing is:

  1. The database is up, query executes correctly
  2. The database goes down, we get the error "Database is shutting down"
  3. The database comes back online: we get the "table does not exist" error, which seems to me related to some connection rights.

It seems to me that somehow, after the reboot of the database, the connection pool retains the rights to be connected and execute queries, but the actual grants on the tables are no longer applied to the session.

We don't manage the database (it's from a third party service), so I have no idea how this is possible, and I have insufficient access rights to investigate this behaviour.

To answer your final question, the error is indeed produced during a query execution on the pool. When the script starts up, it creates a single pool and stores the promise of the connect call in a global variable. All queries are executed on the resolved result of this variable within the code. I'm not sure whether this is a good practice, but from what I understand from the docs, there is no getConnection call, and you execute the queries directly on the connected pool, which handles the connection. Is this correct?

This is the current code for managing the pool:

import { ConnectionPool } from 'mssql'

let conn: Promise<ConnectionPool>

export const connect = (): Promise<ConnectionPool> => {
    if (!conn) {
        if (!process.env.SERVER) {
            return Promise.reject(new Error('Please specify server address'))
        }
        const port = process.env.PORT ? { port: Number(process.env.PORT) } : {}

        const pool = new ConnectionPool({
            server: process.env.SERVER,
            database: process.env.DATABASE,
            ...port,
            options: {
                enableArithAbort: true,
                encrypt: false,
                fallbackToDefaultDb: true,
            },
            domain: process.env.DOMAIN,
            user: process.env.USERNAME,
            password: process.env.PASSWORD,
        })
        conn = pool.connect()
    }
    return conn
}

And then this is used as such:

const conn = await connect()
return conn.request()
        .input('date', subDays(new Date(), 1))
        .query(queryString)
dhensby commented 1 year ago

All queries are executed on the resolved result of this variable within the code. I'm not sure whether this is a good practice, but from what I understand from the docs, there is no getConnection call, and you execute the queries directly on the connected pool, which handles the connection. Is this correct?

Yes, the pool retrieves a connection just-in-time to perform the query and automatically releases it after the result is received.

If you're reluctant to upgrade the library (understandable given what you've said). I'd suggest the "easiest" approach is simply to close the pool when you see this error and then retry connections until it comes back up and then (I presume) the connections will work.

Your pool manager logic would need to handle the disconnect logic (remove the reference to the disconnected pool) so subsequent connect calls create a new pool instead of returning the old conn value.