tediousjs / tedious

Node TDS module for connecting to SQL Server databases.
http://tediousjs.github.io/tedious/
MIT License
1.58k stars 439 forks source link

Socket Error: Connection lost - read ECONNRESET #1483

Open MaddyTP opened 2 years ago

MaddyTP commented 2 years ago

Software versions

Connection configuration

const azureConfig = {
  server: process.env.AZURE_GLOBAL_SERVE,
  options: {
      database: process.env.AZURE_GLOBAL_DB,
      trustServerCertificate: true,
      useColumnNames: true,
  },
  authentication: {
      type: 'default',
      options: {
          userName: process.env.AZURE_GLOBAL_USER,
          password: process.env.AZURE_GLOBAL_PASS,
          connectTimeout: 30000,
          connectionRetryInterval: 1000,
      }
  }
};

const executeSQL = (config, query, params) => new Promise((resolve, reject) => {
  var result = [];
  const connection = new Connection(config);
  const request = new Request(query, (err) => {
    if (err) {
      reject(err);
    }
    resolve(result);
    connection.close()
  });
  request.setTimeout(params);
  request.on('row', rows => {
    const tmp = {};
    const keys = Object.keys(rows);
    keys.forEach((cols) => {
      tmp[cols] = rows[cols].value;
    })
    result.push(tmp);
  });
  connection.on('connect', err => {
    if (err) {
      reject(err);
    } else {
      connection.execSql(request);
    }
  });
  connection.connect();
});

Problem description Error from socket happens several times each day causing app to restart. Source seems to be TediousJS as it's the only library being used for the DB connections. 80% of the time the app is working as expected until these errors are thrown. No discernable pattern.

Error message/stack trace

ConnectionError: Connection lost - read ECONNRESET
     at Connection.socketError (/root/bi-auto/server/node_modules/tedious/lib/connection.js:1403:26)
     at Socket.<anonymous> (/root/bi-auto/server/node_modules/tedious/lib/connection.js:1155:16)
     at Socket.emit (node:events:539:35)
     at emitErrorNT (node:internal/streams/destroy:157:8)
     at emitErrorCloseNT (node:internal/streams/destroy:122:3)
     at processTicksAndRejections (node:internal/process/task_queues:83:21) {
   code: 'ESOCKET',
   isTransient: undefined
}
MaddyTP commented 2 years ago

1277 appears to be similar.

MichaelSun90 commented 2 years ago

Hi @MaddyTP, can you try to enable debug option on tedious side, so it can provide a bit more information on the communication between server and diver side? It may help us figure out what cause the connection lose. Here is an example for enable debugging from tedious side:

var config = {
  server: "192.168.XXX.XXX",
  options: {
    rowCollectionOnRequestCompletion: true,
    encrypt: false,
    database: "database",
    // Add this to your configuration file. You can pick and choose which ones to enable. 
    debug: {
      packet: true,
      data: true,
      payload: true,
      token: true
    }
  },
  authentication: {
    type: "default",
    options: {
      userName: "admin",
      password: "123",
    }
  }
};

Then add this event listener to output the debug messages:

  connection.on('debug', (msg) => {
    console.log(msg);
  });
MaddyTP commented 2 years ago

Debugging added to code, will update once the error occurs again.

MichaelSun90 commented 2 years ago

Thanks!

MaddyTP commented 2 years ago

Error seems to be a 10 second timeout with the TLS negotiation step:

2022-10-18T20:50:04: connected to [redacted]:1433
2022-10-18T20:50:04: State change: Connecting -> SentPrelogin
2022-10-18T20:50:04: State change: SentPrelogin -> SentTLSSSLNegotiation
2022-10-18T20:50:04: TLS negotiated ([redacted], TLSv1.2)
2022-10-18T20:50:04: State change: SentTLSSSLNegotiation -> SentLogin7Withfedauth
2022-10-18T20:50:14: Connection lost - read ECONNRESET
arthurschreiber commented 2 years ago

The timeout seems to happen after the TLS negotiation has completed. We sent a LOGIN7 message to the server, and then switch the connection to the SENT_LOGIN7_WITH_FEDAUTH state.

In that state, we wait for a response from the server with additional login information (e.g. information required for the federated authentication scheme). Once that data is received, we make a request to the authentication service. If we don't receive the response from the authentication service in time, the SQL Server will most likely close the connection.

The code for this is located here: https://github.com/tediousjs/tedious/blob/ece396aa07a8e7b8b82276de7d14b5c8cbf95b98/src/connection.ts#L3382-L3484

Can you enable more debug information, so we can see exactly what data is sent / received after the state is changed to SENT_LOGIN7_WITH_FEDAUTH?

MaddyTP commented 2 years ago

Where would you suggest the extra logging be placed in the referenced function?

MaddyTP commented 2 years ago

Error is recurring several time per day and always after this step:

2022-10-18T20:50:04: State change: SentTLSSSLNegotiation -> SentLogin7Withfedauth

Recommended debugging doesn't reveal the source of the issue within the SENT_LOGIN7_WITH_FEDAUTH function. Is there a way to enable additional bugging through Tedious?

MichaelSun90 commented 2 years ago

Hi @MaddyTP,

debug: {
      packet: true,
      data: true,
      payload: true,
      token: true
    }

From my example within the pervious comment, there is multiple debug options that you can enable from connection configuration. Also, Here is a section from our git hub IO doc here which explained each debug option and its function:
"options.debug.packet A boolean, controlling whether debug events will be emitted with text describing packet details (default: false). options.debug.data A boolean, controlling whether debug events will be emitted with text describing packet data details (default: false). options.debug.payload A boolean, controlling whether debug events will be emitted with text describing packet payload details (default: false). options.debug.token A boolean, controlling whether debug events will be emitted with text describing token stream tokens (default: false)."

MaddyTP commented 2 years ago

Hi @MaddyTP,

debug: {
      packet: true,
      data: true,
      payload: true,
      token: true
    }

From my example within the pervious comment, there is multiple debug options that you can enable from connection configuration. Also, Here is a section from our git hub IO doc here which explained each debug option and its function: "options.debug.packet A boolean, controlling whether debug events will be emitted with text describing packet details (default: false). options.debug.data A boolean, controlling whether debug events will be emitted with text describing packet data details (default: false). options.debug.payload A boolean, controlling whether debug events will be emitted with text describing packet payload details (default: false). options.debug.token A boolean, controlling whether debug events will be emitted with text describing token stream tokens (default: false)."

Debugging through Tedious was already performed, see previous comments.

MichaelSun90 commented 2 years ago

Error seems to be a 10 second timeout with the TLS negotiation step:

2022-10-18T20:50:04: connected to [redacted]:1433
2022-10-18T20:50:04: State change: Connecting -> SentPrelogin
2022-10-18T20:50:04: State change: SentPrelogin -> SentTLSSSLNegotiation
2022-10-18T20:50:04: TLS negotiated ([redacted], TLSv1.2)
2022-10-18T20:50:04: State change: SentTLSSSLNegotiation -> SentLogin7Withfedauth
2022-10-18T20:50:14: Connection lost - read ECONNRESET

from you out put here, seems that you only enabled the minimal debug output that logs the state changes. If you can enable other debug options, you will got much more from the debug output.

Could you double check your configuration? we got to options for tedious connection configuration. On is for authentication specific, and one for general options. The debug options should be put under the general one.

MaddyTP commented 2 years ago

Ah, I didn't add debugging to the general options. Will update when I have data logs.

MaddyTP commented 2 years ago

Full message from process start to error with configs:

const azureConfig = {
  server: process.env.AZURE_GLOBAL_SERVE,
  options: {
      database: process.env.AZURE_GLOBAL_DB,
      trustServerCertificate: true,
      useColumnNames: true,
      debug: {
        packet: true,
        data: true,
        payload: true,
        token: true
      },
  },
  authentication: {
      type: 'default',
      options: {
          userName: process.env.AZURE_GLOBAL_USER,
          password: process.env.AZURE_GLOBAL_PASS,
          connectTimeout: 30000,
          connectionRetryInterval: 1000,
          debug: {
            packet: true,
            data: true,
            payload: true,
            token: true
          },
      }
  }
};

const executeSQL = (config, query, params) => new Promise((resolve, reject) => {
  var result = [];
  const connection = new Connection(config);
  const request = new Request(query, (err) => {
    if (err) {
      reject(err);
    }
    resolve(result);
    connection.close()
  });
  request.setTimeout(params);
  request.on('row', rows => {
    const tmp = {};
    const keys = Object.keys(rows);
    keys.forEach((cols) => {
      tmp[cols] = rows[cols].value;
    })
    result.push(tmp);
  });
  connection.on('connect', err => {
    if (err) {
      reject(err);
    } else {
      connection.execSql(request);
    }
  });
  connection.on('debug', (msg) => {
    console.log(msg);
  });
  connection.connect();
});

Logs:

2022-10-24T19:50:00: State change: Initialized -> Connecting
2022-10-24T19:50:01: connected to [DYNAMICS]:1433
2022-10-24T19:50:01: State change: Connecting -> SentPrelogin
2022-10-24T19:50:01: State change: SentPrelogin -> SentTLSSSLNegotiation
2022-10-24T19:50:01: TLS negotiated ([REDACTED], TLSv1.2)
2022-10-24T19:50:01: State change: SentTLSSSLNegotiation -> SentLogin7Withfedauth
2022-10-24T19:50:02: State change: SentLogin7Withfedauth -> SentLogin7WithStandardLogin
2022-10-24T19:50:02: State change: SentLogin7WithStandardLogin -> LoggedInSendingInitialSql
2022-10-24T19:50:02: State change: LoggedInSendingInitialSql -> LoggedIn
2022-10-24T19:50:02: State change: LoggedIn -> SentClientRequest
2022-10-24T19:50:03: State change: SentClientRequest -> LoggedIn
2022-10-24T19:50:03: State change: LoggedIn -> Final
2022-10-24T19:50:03: State change: Initialized -> Connecting
2022-10-24T19:50:03: connection to [DYNAMICS]:1433 closed
2022-10-24T19:50:03: State is already Final
2022-10-24T19:50:03: connected to [AZURE]:1433
2022-10-24T19:50:03:   PreLogin - version:15.1.0.0, encryption:0x01(ON), instopt:0x00, threadId:0x00000000, mars:0x00(OFF)
2022-10-24T19:50:03: State change: Connecting -> SentPrelogin
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Sent
2022-10-24T19:50:03:   type:0x12(PRELOGIN), status:0x01(EOM), length:0x0035, spid:0x0000, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Received
2022-10-24T19:50:03:   type:0x04(TABULAR_RESULT), status:0x01(EOM), length:0x0031, spid:0x0000, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03:   PreLogin - version:12.0.312.0, encryption:0x01(ON), instopt:0x00, threadId:0x00000000, mars:0x00(OFF)
2022-10-24T19:50:03: State change: SentPrelogin -> SentTLSSSLNegotiation
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Sent
2022-10-24T19:50:03:   type:0x12(PRELOGIN), status:0x01(EOM), length:0x0134, spid:0x0000, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Received
2022-10-24T19:50:03:   type:0x12(PRELOGIN), status:0x00(), length:0x1000, spid:0x0000, packetId:0x00, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Received
2022-10-24T19:50:03:   type:0x12(PRELOGIN), status:0x01(EOM), length:0x0126, spid:0x0000, packetId:0x00, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Sent
2022-10-24T19:50:03:   type:0x12(PRELOGIN), status:0x01(EOM), length:0x00A6, spid:0x0000, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Received
2022-10-24T19:50:03:   type:0x12(PRELOGIN), status:0x01(EOM), length:0x003B, spid:0x0000, packetId:0x00, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: TLS negotiated ([REDACTED], TLSv1.2)
2022-10-24T19:50:03:   Login7 - TDS:0x74000004, PacketSize:0x00001000, ClientProgVer:0x00000000, ClientPID:0x0000E512, ConnectionID:0x00000000
2022-10-24T19:50:03:            Flags1:0xF0, Flags2:0x00, TypeFlags:0x00, Flags3:0x18, ClientTimezone:240, ClientLCID:0x00000409
2022-10-24T19:50:03:            Hostname:[REDACTED], Username:[REDACTED], Password:[REDACTED], AppName:'Tedious', ServerName:[REDACTED], LibraryName:'Tedious'
2022-10-24T19:50:03:            Language:'us_english', Database:[REDACTED], SSPI:'undefined', AttachDbFile:'undefined', ChangePassword:'undefined'
2022-10-24T19:50:03: State change: SentTLSSSLNegotiation -> SentLogin7WithStandardLogin
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Sent
2022-10-24T19:50:03:   type:0x10(LOGIN7), status:0x01(EOM), length:0x016B, spid:0x0000, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Received
2022-10-24T19:50:03:   type:0x04(TABULAR_RESULT), status:0x01(EOM), length:0x023F, spid:0x005C, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: Packet size changed from 4096 to 4096
2022-10-24T19:50:03: State change: SentLogin7WithStandardLogin -> LoggedInSendingInitialSql
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Sent
2022-10-24T19:50:03:   type:0x01(SQL_BATCH), status:0x01(EOM), length:0x02F4, spid:0x0000, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Received
2022-10-24T19:50:03:   type:0x04(TABULAR_RESULT), status:0x01(EOM), length:0x0173, spid:0x005C, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03: State change: LoggedInSendingInitialSql -> LoggedIn
2022-10-24T19:50:03: State change: LoggedIn -> SentClientRequest
2022-10-24T19:50:03: 
2022-10-24T19:50:03: Sent
2022-10-24T19:50:03:   type:0x03(RPC_REQUEST), status:0x01(EOM), length:0x0B16, spid:0x0000, packetId:0x01, window:0x00
2022-10-24T19:50:03:   [REDACTED]
2022-10-24T19:50:03:   RPC Request - sp_executesql
2022-10-24T19:50:04: 
2022-10-24T19:50:04: Received
2022-10-24T19:50:04:   type:0x04(TABULAR_RESULT), status:0x01(EOM), length:0x03CD, spid:0x005C, packetId:0x01, window:0x00
2022-10-24T19:50:04:   [REDACTED]
2022-10-24T19:50:04: State change: SentClientRequest -> LoggedIn
2022-10-24T19:50:04: State change: LoggedIn -> Final
2022-10-24T19:50:04: State change: Initialized -> Connecting
2022-10-24T19:50:04: connection to [AZURE]:1433 closed
2022-10-24T19:50:04: State is already Final
2022-10-24T19:50:04: connected to [DYNAMICS]:1433
2022-10-24T19:50:04: State change: Connecting -> SentPrelogin
2022-10-24T19:50:04: State change: SentPrelogin -> SentTLSSSLNegotiation
2022-10-24T19:50:04: TLS negotiated ([REDACTED], TLSv1.2)
2022-10-24T19:50:04: State change: SentTLSSSLNegotiation -> SentLogin7Withfedauth
2022-10-24T19:50:14: Connection lost - read ECONNRESET
arthurschreiber commented 2 years ago

My gut feeling is that it times out around here: https://github.com/tediousjs/tedious/blob/c7e3e3619dff75dc7258dee6ea9a0c161ecae449/src/connection.ts#L3444

tobiasheldring commented 2 years ago

Can confirm we see the exact same issue and behaviour with a Azure SQL Database connection, Node 16.17.1, Tedious 15.1.1

tobiasheldring commented 2 years ago

Found that one can successfully catch and handle the error, and avoid the node process from crashing, by listening to the error event

connection.on("error", (error) => {
  console.error("Connection error", error);
  // Handle error here like resetting connection or removing it from connection pool
});
jtomaszewski commented 4 months ago

We also get this, with Managed Azure SQL database, node v18, newest tedious (18.2.0). Besides this error, we also get a few others happening. (Connection lost - write ECONNRESET, Failed to connect to xxx.database.windows.net:1433 in 15000ms, Failed to connect to xxx.database.windows.net:1433 - Could not connect (sequence), Failed to connect to xxx.database.windows.net:1433 - write ECONNRESET

Interesting fact is that if we increase the number of queries handled in "parallel" by our app (we have a bull process that process async jobs; and we can increase the number of worker processes that run in the background), then there's way more of errors like this. And if we reduce the number, then there's less.

Also, we noted that if we have three node.js processes with concurrency=1, then there's no errors; but if we have one process with concurrency=3, then there's errors. This would indicate that the fault is not at our database's side (and the inability of it to handle multiple requests in parallel), but at tedious/node.js ...

Gideon-Felt commented 1 month ago

@jtomaszewski We are observing the same thing. Have you found any resolution?

jtomaszewski commented 1 month ago

Nope :( For now we keep the concurrency within a node.js app super low, and instead, launch a lot separate of node.js instances (in k8s)... and consider switching to a database different to MS SQL.