sidorares / node-mysql2

:zap: fast mysqljs/mysql compatible mysql driver for node.js
https://sidorares.github.io/node-mysql2/
MIT License
4.04k stars 610 forks source link

Sporadic read ETIMEDOUT error #2651

Closed invaderb closed 4 months ago

invaderb commented 4 months ago

Hey all,

Hoping to get some help with this sporadic error I'm seeing with mysql2

Same background info on what I'm using node 20 nestjs 10 typeorm 0.3.20 mysql2 3.9.4

I have 2 nodes running behind a load balancer that do connect over a virtual network tunnel to an on prem server that hosts the Mariadb on version 8.1

Here's my config

{
    name: 'default',
    type: process.env.DB_TYPE as 'mysql' | 'sqlite' || 'mysql',
    host: process.env.DB_HOST || 'localhost',
    port: parseInt(process.env.DB_PORT) || 3306,
    username: process.env.DB_USERNAME,
    password: process.env.DB_PASSWORD,
    database: process.env.DB_NAME,
    entities: [__dirname + '/**/*.entity{.ts,.js}'],
    synchronize: false,
    connectionTimeout: 30000,
    extra: {
        idleTimeout: 30000,
        maxIdle: 10,
        enableKeepAlive: true,
        keepAliveInitialDelay: 300,
    }
}

I'm getting these random read ETIMEDOUT errors through out the day but can't pin point any specific pattern or reason why it's timing out

QueryFailedError: read ETIMEDOUT
    at Query.onResult (/var/apps/lcs-services/src/driver/mysql/MysqlQueryRunner.ts:246:33)
    at PoolConnection._notifyError (/var/apps/lcs-services/node_modules/mysql2/lib/connection.js:228:21)
    at PoolConnection._handleFatalError (/var/apps/lcs-services/node_modules/mysql2/lib/connection.js:183:10)
    at PoolConnection._handleNetworkError (/var/apps/lcs-services/node_modules/mysql2/lib/connection.js:196:10)
    at Socket.emit (node:events:514:28)
    at runInContextCb (/var/apps/lcs-services/node_modules/newrelic/lib/shim/shim.js:1159:22)
    at AsyncLocalStorage.run (node:async_hooks:346:14)
    at AsyncLocalContextManager.runInContext (/var/apps/lcs-services/node_modules/newrelic/lib/context-manager/async-local-context-manager.js:65:36)
    at Shim.applySegment (/var/apps/lcs-services/node_modules/newrelic/lib/shim/shim.js:1149:25)
    at Socket.wrapper (/var/apps/lcs-services/node_modules/newrelic/lib/shim/shim.js:1754:17) {
  query: 'START TRANSACTION',
  parameters: undefined,
  driverError: Error: read ETIMEDOUT
      at TCP.onStreamRead (node:internal/stream_base_commons:217:20)
      at runInContextCb (/var/apps/lcs-services/node_modules/newrelic/lib/shim/shim.js:1159:22)
      at AsyncLocalStorage.run (node:async_hooks:335:14)
      at AsyncLocalContextManager.runInContext (/var/apps/lcs-services/node_modules/newrelic/lib/context-manager/async-local-context-manager.js:65:36)
      at Shim.applySegment (/var/apps/lcs-services/node_modules/newrelic/lib/shim/shim.js:1149:25)
      at TCP.wrapper (/var/apps/lcs-services/node_modules/newrelic/lib/shim/shim.js:1754:17)
      at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
    errno: -110,
    code: 'ETIMEDOUT',
    syscall: 'read',
    fatal: true
  },
  errno: -110,
  code: 'ETIMEDOUT',
  syscall: 'read',
  fatal: true
}

I've ran pings from the nodes to the on prem server and didn't see any dropped packets on the on prem network side there was a tiny bit of packet loss on the node side but should be within an ok percentage 1318 packets transmitted, 1317 received, 0.0758725% packet loss, time 1318903ms 1360 packets transmitted, 1354 received, 0.441176% packet loss, time 1361060ms node a had a bit more packet loss

within the time of running the pings there were 2 more ETIMEDOUT errors

I also tried looking at the mysql logs and it's completely empty

Any other troubleshooting steps I could try to help resolve this issue would be greatly appreciated!

benyaminl commented 4 months ago

Is it using Docker Swarm?

invaderb commented 4 months ago

no I'm just running the node processes with pm2 on a linux VM and the mariadb service is also just on a linux VM

benyaminl commented 4 months ago

no I'm just running the node processes with pm2 on a linux VM and the mariadb service is also just on a linux VM

But they connect Via VPN, correct?

Could you reduce the maxIdle to zero? I seen a lot of problem on open connection using mysql2 lib with opened connection especially if the config between Maria DB and mysql2 config does not match. Especially for idle timeout.

invaderb commented 4 months ago

ok let me try that and report back

invaderb commented 4 months ago

ok after a few days of that change being deployed the ETIMEDOUT errors have stopped, what is the config mismatch thats causing this?

benyaminl commented 4 months ago

ok after a few days of that change being deployed the ETIMEDOUT errors have stopped, what is the config mismatch thats causing this?

The idle time out on the mysql/mariadb, and the other problem is the network (your VPN).

I seen many problem with MySQL2 driver with kubernetes, docker swarm, and other that's virtual network. I have one myself last time and after rechecking because of IPVS on kubernetes.

MySQL2 keep alive keep getting feedback even mysql/mariadb server closed, causing when reusing the connection pool you got bad connection, and ends up with timeout connection, which is sucks.

The first way to tackle that is reducing the idle opened connection to zero, and 2nd thing is to reduce the max Idle time to below 5 minutes if you connection reliable never have interuption (IPVS only open connection for 15 minutes then forget about it), or don't allow any idle connection at all

Anyway if your problem is solved with that, then please do close the issue ticket, because it's not mysql2 driver fault at the first place.

invaderb commented 4 months ago

@benyaminl Thank you for the explanation and help on this really appreciate it.

ruanitto commented 4 months ago

e first way to tackle that is reducing the idle opened connection to zero, and 2nd thing is to reduce the max Idle time to below 5 minutes if you connection reliable never have interuption (IPVS only open connection for 15 minutes then forget about it), or don't allow any idle connectio

How this fix could it be implemented using knex + mysql2 lib?

benyaminl commented 4 months ago

@ruanitto I already state that you can lower the maxIdle to zero, on the mysql2 config.

You need to do it to make the connection pool doesn't have any idle connection

There are open ticket #2599 that still in motion.

Also, as I state before, your env/network driver could blame for it...

So, for now, the solution via config is to set no idle connection at all. After using the connection, disposed and closed. As Simple as that.

you can add extra config on the connection pool to maxIdle: 0, if it still not clear for you

Refer to https://sidorares.github.io/node-mysql2/docs#using-connection-pools