tediousjs / node-mssql

Microsoft SQL Server client for Node.js
https://tediousjs.github.io/node-mssql
MIT License
2.23k stars 466 forks source link

Request timeouts in mssql/msnodesqlv8 when using default Pool size #1628

Open remz opened 6 months ago

remz commented 6 months ago

I am switching from mssql (Tedious) to mssql/msnodesqlv8, because we need Windows Authentication support in our Node.js service. However after switching I run into the following issue.

We have a test that checks if there are no race conditions while getting a new id from the database. To make this operation atomic a TABLOCKX is used inside a transaction. The tests starts 100 promises in parallel and checks if 100 unique numbers are generated in the end.

This test succeeds without any issues when using the mssql (Tedious) driver. However using mssql/msnodesqlv8, the test fails with the default pool size of 10.

Expected behaviour:

I would expect the test to show the same behavior as when using mssql.

Actual behaviour:

Using mssql/msnodesqlv8, the test fails with the default pool size of 10. (it fails with request timeout errors) When I set the pool size to 4, the test succeeds. Using any bigger pool size will fail the test.

Questions:

Configuration:

The related table is defined as follows:

  CREATE TABLE [dbo].[un_reportlayoutno](
    [NextId] [int] NOT NULL
  ) ON [PRIMARY]

I tried to reduce the code to the simplest version possible, to make sure this issue is not related to my own code. The test code;

const sql: any = require('mssql/msnodesqlv8');

function getConfig(): config {
    return {
      database: 'EXPR4MSDE_DEV',
      server: 'X10573',
      port: undefined,
      connectionTimeout: 15000,
      requestTimeout: 2000,
      pool: {
        min: 1,
        max: 4
      },
      options: {
        encrypt: false,
        trustedConnection: true,
        instanceName: 'MSDE4EXPR',
        useUTC: false
      }
    };
  }

  async function giveNextId(cp: ConnectionPool): Promise<number> {
    const getQuery = `SELECT un_reportlayoutno.NextId FROM un_reportlayoutno WITH (TABLOCKX)`;
    const updateQuery = (un: number) => `UPDATE un_reportlayoutno SET NextId = ${un+1}`;
    const trans = cp.transaction();
    await trans.begin();
    try {
      const r = await trans.request().query(getQuery);
      const result = r.recordset[0].NextId as number;
      await trans.request().query(updateQuery(result));
      await trans.commit();
      return result;
    }
    catch (err) {
      await trans.rollback();
      throw err;
    }
  }

  test("unique numbers hammering simplified", async () => {
    const cp: ConnectionPool = new sql.ConnectionPool(getConfig());
    await cp.connect();
    const iterations = new Array(100).fill(0);
    try {
      const nrs = await Promise.all(iterations.map(_ => giveNextId(cp)));
      const check = new Set<number>();
      nrs.forEach((n) => check.add(n));
      expect(check.size).toBe(nrs.length);
    }
    finally {
      await cp.close();
    }
  }, 60000);

Software versions

dhensby commented 6 months ago

As I understand it, this is essentially a NFR test which is to check that you can perform 100 transactions within 2 seconds?

Your request timeout is 2000ms, you're running 100 transactions in parallel expecting them all to complete before any timeouts are fired.

Whilst I understand that it may be frustrating that one driver meets this requirement but another doesn't, that is likely to be the nature of different drivers (they have different performance characteristics).

For there to be a convincing case that there is a problem inherently with this library (and not just the underlying driver), I think we'd need to be able to prove the driver can meet the NFRs in a direct implementation, otherwise it's just a performance limitation with the driver and that's something to be raised with the author of that library.

Of course, these kinds of performance requirements are also going to be subject to the hardware/infrastructure that the test or application is running on. This test passing locally or on CI isn't a guarantee it'll pass on the production infrastructure.

Laurensvanrun commented 6 months ago

@dhensby as far as I read the post, it is not performance related because it works when the pool size is reduced. @remz can you elaborate more on this issue? What exactly works and what fails? Do some of the promises resolves and not all within the time frame (which could indeed point to a performance issue), or do you only have one (or non?) of the promises resolve in that case?

dhensby commented 6 months ago

@Laurensvanrun good spot, I had missed that detail.

Bizarre that it succeeds when the pool size is smaller, I wonder if there's something about the way msnodesqlv8 is queuing requests that could cause some to push ahead, leaving some to timeout because of the lock...

The pooling logic is identical regardless of the driver, so I'm struggling to see how this could be anything other than a quirk with msnodesqlv8, but we'd need it implemented directly with that library to know for sure.

remz commented 6 months ago

@dhensby Thanks for your feedback/suggestions The remarkable thing is that one of the promises usually succeeds and the rest fail with request timeouts. Even if I increase the request timeout to 45 seconds, the requests still fail with timeouts So there seems to be some kind of locking issue that prevents the transactions to progress as soon as the pool size is larger than 4. Trying to use msnodesqlv8 directly (so without the mssql wrapper) is a good suggestion, I will give that a try..