mysqljs / mysql

A pure node.js JavaScript Client implementing the MySQL protocol.
MIT License
18.22k stars 2.53k forks source link

Connection to database hangs after Error: read ETIMEDOUT #2522

Closed bmaupin closed 2 years ago

bmaupin commented 2 years ago

I'm indirectly using this library via loopback-connector-mysql. By default, that library creates a pool with 10 connections, so any time there is a database issue, my expectation would be that once the database was reachable again, any bad connections would be terminated and the pool would be filled with good connections. This is what I've observed so far in my testing with a local database. If I stop the database, I get errors such as:

Error: Connection lost: The server closed the connection.
    at Protocol.end (node_modules/mysql/lib/protocol/Protocol.js:112:13)

Or if I then start up a dummy server that times out connections, e.g.

nc -l 3306

(https://stackoverflow.com/a/37465639/399105)

I get errors such as:

Error: Handshake inactivity timeout
    at Handshake.<anonymous> (node_modules/mysql/lib/protocol/Protocol.js:160:17)

In both cases (and others I tried), after bringing the database back up, the application was able to reconnect to it without any intervention.

However, in a production application we ran into some errors where the application was hung and would not reconnect to the database until the application was restarted:

Error: Cannot enqueue Query after fatal error.
    at Protocol._validateEnqueue (/usr/src/app/node_modules/mysql/lib/protocol/Protocol.js:212:16)
    at Protocol._enqueue (/usr/src/app/node_modules/mysql/lib/protocol/Protocol.js:138:13)
    at PoolConnection.query (/usr/src/app/node_modules/mysql/lib/Connection.js:198:25)
    at runQuery (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:197:16)
    at executeWithConnection (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:239:7)
    at MySQL.executeSQL (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:248:5)
    at /usr/src/app/node_modules/loopback-connector-mysql/node_modules/loopback-connector/lib/sql.js:597:10
    at /usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:259:9
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at MySQL.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:178:5)
    at MySQL.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8)
    at MySQL.ObserverMixin.notifyObserversAround (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:228:15)
    at MySQL.SQLConnector.execute (/usr/src/app/node_modules/loopback-connector-mysql/node_modules/loopback-connector/lib/sql.js:596:8)
    at MySQL._modifyOrCreate (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:278:8)
    at MySQL.save.MySQL.updateOrCreate (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:318:8)
    at invokeConnectorMethod (/usr/src/app/node_modules/loopback-datasource-juggler/lib/dao.js:172:21)
    at /usr/src/app/node_modules/loopback-datasource-juggler/lib/dao.js:637:13
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at Function.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:178:5)
    at Function.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8)
    at Function.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:176:15)
    at Function.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8)
    at Function.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:176:15)
    at Function.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8)
    at Function.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:176:15) 
Error: read ETIMEDOUT
    at TCP.onStreamRead (internal/stream_base_commons.js:209:20)
    at TCP.callbackTrampoline (internal/async_hooks.js:126:14)
    --------------------
    at Protocol._enqueue (/usr/src/app/node_modules/mysql/lib/protocol/Protocol.js:144:48)
    at PoolConnection.query (/usr/src/app/node_modules/mysql/lib/Connection.js:198:25)
    at runQuery (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:197:16)
    at executeWithConnection (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:239:7)
    at MySQL.executeSQL (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:248:5)
    at /usr/src/app/node_modules/loopback-connector-mysql/node_modules/loopback-connector/lib/sql.js:597:10
    at /usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:259:9
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at MySQL.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:178:5)
    at MySQL.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8)
    at MySQL.ObserverMixin.notifyObserversAround (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:228:15)
    at MySQL.SQLConnector.execute (/usr/src/app/node_modules/loopback-connector-mysql/node_modules/loopback-connector/lib/sql.js:596:8)
    at MySQL._modifyOrCreate (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:278:8)
    at MySQL.save.MySQL.updateOrCreate (/usr/src/app/node_modules/loopback-connector-mysql/lib/mysql.js:318:8)
    at invokeConnectorMethod (/usr/src/app/node_modules/loopback-datasource-juggler/lib/dao.js:172:21)
    at /usr/src/app/node_modules/loopback-datasource-juggler/lib/dao.js:637:13
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at doNotify (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:155:49)
    at Function.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:178:5)
    at Function.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8)
    at Function.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:176:15)
    at Function.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8)
    at Function.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:176:15)
    at Function.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8)
    at Function.ObserverMixin._notifyBaseObservers (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:176:15)
    at Function.ObserverMixin.notifyObserversOf (/usr/src/app/node_modules/loopback-datasource-juggler/lib/observer.js:153:8) 

(the last error is repeated, I'm guessing for each connection in the pool).

So I have some questions:

  1. What's so special about Error: read ETIMEDOUT at TCP.onStreamRead that prevents the pool from reconnecting to the database?
  2. How can I reproduce this?
  3. What does this mean: Cannot enqueue Query after fatal error.? My first thought was that the fatal error was a fatal error on the connection itself, but that doesn't seem to be the case as I get a fatal error on the connection when I stop the database, but it comes back up just fine as previously described.

Thanks!

dougwilson commented 2 years ago

Hello, and thank you for opening a new issue for this, so I can fix the error handling under that first ticket.

What's so special about Error: read ETIMEDOUT at TCP.onStreamRead that prevents the pool from reconnecting to the database?

That error is not actually produced from this library, rather it comes from Node.js. It is the representation of a TCP timeout on the system (OS) network stack. When it occurs, the TCP connection is no longer functional. You would need to create a new connection to the database. If you are using the pool, you would do this using .getConnection or use .createConnection if not using the pool.

How can I reproduce this?

Unfortunately I am not sure. It is poasible the Node.js project may know how, but I unfortunately do not.

What does this mean: Cannot enqueue Query after fatal error.? My first thought was that the fatal error was a fatal error on the connection itself, but that doesn't seem to be the case as I get a fatal error on the connection when I stop the database, but it comes back up just fine as previously described.

This means that you have a reference to a specific connection object in your code. That connection then emitted an error that had the .fatal property set to true, but then your code called .query on that same object after the error.

Ideally if you get a fatal error on a connection, you should either call .release on it to destroy it and call .getConnection to get a new one if you are using the pool, or just call .createConnection to create a new connection if not. Always be careful how you structure your logic, as when the TCP connection is lost from the server, ant temp tables, variables, and uncommitted transactions are lost.

bmaupin commented 2 years ago

Thanks so much for the feedback. I did suspect at one point that loopback-connector-mysql was not properly handling fatal errors, until I realized that when I shut down the database and try to run a query it also returns a fatal error, but works again once I bring the database back up.

I wonder if there isn't something unique about this specific error that causes this behaviour. I guess I'll keep investigating.

I'll go ahead and close this since there's no evidence the bug is in this library and not in loopback-connector-mysql.

Thanks!

dougwilson commented 2 years ago

So if it helps at all, though I'm not familiar with the library, and it seems more of a helper library, I noticed based on the stack trace that the executeSQL can indeed have this behavior when it is called passed in a transaction object in the options argument. In that case, the library will just run a query on the given connection and delegates the handling of the lifecycle of that connection (including handing fatal errors) to whatever is calling the library.

Both calls above stack trace as coming from the loopback-connector library. Looking there, it is just passing the data and errors back to what calls it without any real logic. Then I see that the loopback-datasource-juggler library is calling that one. I glanced over at the code there and I'm really uncertain what it is doing and how the pieces are connected just looking at the source code, unfortunately.

bmaupin commented 2 years ago

I wonder if it's some sort of corner case in loopback-connector-mysql that's only triggered in very specific situations, maybe even related to the timing of the error. I would settle for being able to reproduce ETIMEDOUT but I haven't even been able to get that far. Things I've tried:

I've found a similar issue here, in which ECONNRESET seems to be another possible error that causes this: https://stackoverflow.com/q/65463097/399105

The thing that gets me is that almost every post I've seen about this mentions using a pool as the solution, which we're already doing. I guess I'll open a bug with loopback-connector-mysql and see what they say. I'm guessing it won't get very far though since I can't reproduce it.

Thanks for all of your help!

bmaupin commented 2 years ago

I just noticed something that can't be a coincidence; on both nodes the gap between the error and the previous log was almost exactly 8 hours

2021-08-30T00:02:46.984Z ... 200 OK 4820.96ms 
2021-08-30T08:02:49.196Z Error: Cannot enqueue Query after fatal error.
    at Protocol._validateEnqueue (/usr/src/app/node_modules/mysql/lib/protocol/Protocol.js:212:16)
2021-08-30T00:02:01.217Z ... 200 OK 33.48ms 
2021-08-30T08:02:48.287Z Error: Cannot enqueue Query after fatal error.
    at Protocol._validateEnqueue (/usr/src/app/node_modules/mysql/lib/protocol/Protocol.js:212:16)

That makes it seem like it's related to MySQL wait_timeout. In my testing I set it to a very short value, but maybe I set it to a value that was too short...

On the other hand, we have Kubernetes readiness probe that runs a SELECT 1 on the database every 30 seconds, so I don't see how the wait_timeout would kick in if we're keeping the connection alive every 30 seconds.