sidorares / node-mysql2

:zap: fast mysqljs/mysql compatible mysql driver for node.js
https://sidorares.github.io/node-mysql2/
MIT License
4.07k stars 618 forks source link

[Pool] This socket has been ended by the other party #447

Closed kevinmartin closed 6 years ago

kevinmartin commented 8 years ago

I get this error every once in a while. Not sure why. Can't replicate (yet). I'm using the latest v1.1.1.

{ Error: This socket has been ended by the other party
    at TLSSocket.writeAfterFIN [as write] (net.js:285:12)
    at PoolConnection.connection.write (node_modules/mysql2/lib/connection.js:272:20)
    at PoolConnection.Connection.writePacket (node_modules/mysql2/lib/connection.js:227:8)
    at Execute.start (node_modules/mysql2/lib/commands/execute.js:50:14)
    at Execute.Command.execute (node_modules/mysql2/lib/commands/command.js:38:20)
    at PoolConnection.Connection.handlePacket (node_modules/mysql2/lib/connection.js:360:28)
    at PoolConnection.Connection.addCommand (node_modules/mysql2/lib/connection.js:378:10)
    at PoolConnection.execute (node_modules/mysql2/lib/connection.js:563:8)
    at node_modules/mysql2/lib/pool.js:159:17
    at node_modules/mysql2/lib/pool.js:36:14
    at _combinedTickCallback (internal/process/next_tick.js:67:7)
    at process._tickDomainCallback (internal/process/next_tick.js:122:9) code: 'EPIPE', fatal: true }
kevinmartin commented 8 years ago

The database seems to be restarting after I execute a bulk insert (with on duplicate key) via a prepared statement.

Issue: https://bugs.mysql.com/bug.php?id=79243#c444928

Shouldn't the connection/pool be able to detect the socket has closed and try to reconnect?

sidorares commented 8 years ago

Shouldn't the connection/pool be able to detect the socket has closed and try to reconnect?

it should, but it does not handle error retry logic, you have to do that in your code. Because the driver does not know what was the side effect of the query (even if it errored) you need to be able to analyse error and decide what to do ( propagate it back to user, retry, do some cleanup etc )

When pool detects that connection is dead it marks it as such and does not return to you next time you do getConnection().

elephantjim commented 7 years ago

I have this same problem with connection pools, too.

@sidorares I see what you're saying that if you get a FIN packet after the query has already started, then yeah, definitely the caller has to handle the error because there's no way to know what did or didn't happen.

On the other hand, in my case the FIN comes looong before the query even starts. I start my own server, and open a pool of connections. The first time a request comes to my server, I query MySQL. That opens a connection, and after the query is done, the connection stays open for reuse. The connection is then held open a long time, and eventually the server closes it. In that case, the pool should detect the closed connection, and remove it from the pool. That doesn't seem to happen.

In _notifyError() in https://github.com/sidorares/node-mysql2/blob/master/lib/connection.js#L198, the function returns immediately returns if connection._fatalError is set. But that means that the pool doesn't get notified, and the connection isn't removed. That is, the logic about bubbling needs to change. stream.on('end') will catch the FIN on https://github.com/sidorares/node-mysql2/blob/master/lib/connection.js#L105 and call _notifyError(), and _notifyError() should let the pool know. If there's a pending query on that particular connection in the pool, the callback should be called with the error (as you say). But regardless of whether there a pending query or not, the connection should be removed from the pool on PROTOCOL_CONNECTION_LOST.

FWIW, the traceback I see is:

Error: This socket has been ended by the other party
    at TLSSocket.writeAfterFIN [as write] (net.js:355:12)
    at PoolConnection.connection.write (/dashboard-api/node_modules/mysql2/lib/connection.js:374:20)
    at PoolConnection.Connection.writePacket (/dashboard-api/node_modules/mysql2/lib/connection.js:291:10)
    at Query.start (/dashboard-api/node_modules/mysql2/lib/commands/query.js:43:14)
    at Query.Command.execute (/dashboard-api/node_modules/mysql2/lib/commands/command.js:40:20)
    at PoolConnection.Connection.handlePacket (/dashboard-api/node_modules/mysql2/lib/connection.js:500:28)
    at PoolConnection.Connection.addCommand (/dashboard-api/node_modules/mysql2/lib/connection.js:521:10)
    at PoolConnection.query (/dashboard-api/node_modules/mysql2/lib/connection.js:606:15)
    at /dashboard-api/node_modules/mysql2/lib/pool.js:158:10
    at /dashboard-api/node_modules/mysql2/lib/pool.js:36:14
elephantjim commented 7 years ago

One other bit: the PoolConnection type listens for the "end" and "error" events here: https://github.com/sidorares/node-mysql2/blob/master/lib/pool_connection.js#L18 But that's the event from the connection. The "end" event from the socket is being suppressed by _notifyError and never reaches the PoolConnection listener.

sidorares commented 7 years ago

Thanks @elephantjim I'll try to double check logic.

Actually even 'error' on socket level is not reliable, I tried to put logs in _notifyError and disconnect network cable and it's not fired immediately, only after query + some timeout

joe-angell commented 7 years ago

I'm getting this to after upgrading to sequelize 4:

      stack:  SequelizeDatabaseError: This socket has been ended by the other party
    at Query.formatError (/home/chico/chico-container/node_modules/sequelize/lib/dialects/mysql/query.js:222:16)
    at Query.connection.query (/home/chico/chico-container/node_modules/sequelize/lib/dialects/mysql/query.js:55:23)
    at Query.nrWrappedHandler (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:400:19)
    at Query.wrapped (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:184:28)
    at Query.wrappedCallback (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:452:66)
    at Query.wrapped [as onResult] (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:184:28)
    at Connection._notifyError (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:212:25)
    at Connection._handleFatalError (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:172:14)
    at Connection._handleNetworkError (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:177:8)
    at TLSSocket.<anonymous> (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:360:20)
    at emitOne (events.js:96:13)
    at TLSSocket.emit (events.js:188:7)
    at TLSSocket._tlsError (_tls_wrap.js:566:8)
    at emitOne (events.js:96:13)
    at TLSSocket.emit (events.js:188:7)
    at TLSSocket.writeAfterFIN [as write] (net.js:294:8)
    at Connection.connection.write (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:374:20)
    at Connection.writePacket (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:291:10)
    at Query.start (/home/chico/chico-container/node_modules/mysql2/lib/commands/query.js:43:14)
    at Query.Command.execute (/home/chico/chico-container/node_modules/mysql2/lib/commands/command.js:40:20)
    at Connection.handlePacket (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:500:28)
    at Connection.addCommand (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:521:10)
    at Connection.query (/home/chico/chico-container/node_modules/mysql2/lib/connection.js:606:15)
    at Connection.wrapped (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:184:28)
    at Connection.wrappedFunction [as query] (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:391:55)
    at Utils.Promise (/home/chico/chico-container/node_modules/sequelize/lib/dialects/mysql/query.js:45:23)
    at Promise._execute (/home/chico/chico-container/node_modules/sequelize/node_modules/bluebird/js/release/debuggability.js:300:9)
    at Promise._resolveFromExecutor (/home/chico/chico-container/node_modules/sequelize/node_modules/bluebird/js/release/promise.js:483:18)
    at new Promise (/home/chico/chico-container/node_modules/sequelize/node_modules/bluebird/js/release/promise.js:79:10)
    at Query.run (/home/chico/chico-container/node_modules/sequelize/lib/dialects/mysql/query.js:44:12)
    at retry (/home/chico/chico-container/node_modules/sequelize/lib/sequelize.js:529:32)
    at /home/chico/chico-container/node_modules/retry-as-promised/index.js:40:21
    at Promise.wrappedExecutorCaller [as _execute] (/home/chico/chico-container/node_modules/newrelic/lib/instrumentation/promise.js:222:18)
    at Promise._resolveFromExecutor (/home/chico/chico-container/node_modules/retry-as-promised/node_modules/bluebird/js/release/promise.js:483:18)
    at new Promise (/home/chico/chico-container/node_modules/retry-as-promised/node_modules/bluebird/js/release/promise.js:79:10)
    at retryAsPromised (/home/chico/chico-container/node_modules/retry-as-promised/index.js:30:10)
    at Promise.try.then.connection (/home/chico/chico-container/node_modules/sequelize/lib/sequelize.js:529:14)
    at bound (domain.js:280:14)
    at runBound (domain.js:293:12)
    at tryCatcher (/home/chico/chico-container/node_modules/sequelize/node_modules/bluebird/js/release/util.js:16:23) 
chrishiestand commented 7 years ago

I'm also seeing this problem sporadically with version 1.2.0. I'm not sure if it's related but I started noticing this after I added a duplicate pool for newer code using promises (the old one being for callbacks). It makes me wonder if there is possibly a bug mixing state between the two pools?

import mysql from 'mysql2'
import mysqlP from 'mysql2/promise'

//...

app.locals.mysql  = mysql.createPool(mysqlOptions)
app.locals.mysqlP = mysqlP.createPool(mysqlOptions)

Total shot in the dark. Otherwise the above/previous reasoning looks good.

edit: After reading https://github.com/sidorares/node-mysql2/issues/604 it looks like the promise wrapper has broken behavior compared with the callback version. This matches my experience

sidorares commented 7 years ago

@chrishiestand don't think it's related to mixing state, more likely some error handling issues. Would be really great to make something to help reliably reproduce this

joe-angell commented 7 years ago

It reliably happens to me with an open ssl connection that is idle for a while.

sidorares commented 7 years ago

@joe-angell is it with rds or just your own server setup?

joe-angell commented 7 years ago

Yeah it's to an ndb cluster mysql 5.6, behind haproxy, a somewhat complicated setup.

sidorares commented 7 years ago

@joe-angell can you check that you get this error without newrelic instrumentation?

Also, is this something that crashes process or just calls callback with error parameter? Latter is kind of expected behaviour for when server decide to close idle connection

joe-angell commented 7 years ago

It calls error callback (the next sequelize query results in error). This did not happen with sequelize 3, so I'm not sure if the issue is with mysql2 or sequelize. I've worked around it by using a pool validate function that does select 1.

sidorares commented 7 years ago

It calls error callback (the next sequelize query results in error). This did not happen with sequelize 3, so I'm not sure if the issue is with mysql2 or sequelize.

@sushantdhiman can you have a look?

sushantdhiman commented 7 years ago

@joe-angell Sometime connections will drop abruptly due to either network issue or server closing connections. Regardless the case if you try to run query on such connection you will likely notice an error like this.

Now with sequelize you will have an internal pool supported by generic-pool library. Recently there were some issues reporting that stale connections aren't automatically dropped, here is an attempt resolving this issue https://github.com/sequelize/sequelize/pull/7924

I suspect sequelize tried to run your query on a stale connection, ideally it should check if connection is stale or not before running query, but it seems it didn't or validation logic is incorrect

@sidorares We try to validate mysql2 connection like this https://github.com/sequelize/sequelize/blob/85ee83d9f78be8c425adf20d119897a72c79a9ce/lib/dialects/mysql/connection-manager.js#L171 Do you see any flag that should be included in this check.

I will try to investigate more, @joe-angell could you try https://github.com/sequelize/sequelize/pull/7924 fix locally to see if resolves your issue

sidorares commented 7 years ago

validate() looks ok to me

paragbhole commented 7 years ago

We are experiencing similar issue on postgres connector.

joe-angell commented 7 years ago

turns out the validate() is not sufficient, after a few days of uptime we start getting timeout exceptions:

          stack:  TimeoutError: ResourceRequest timed out
    at ResourceRequest._fireTimeout (/home/chico/chico-container/node_modules/generic-pool/lib/ResourceRequest.js:58:17)
    at Timeout.bound (/home/chico/chico-container/node_modules/generic-pool/lib/ResourceRequest.js:8:15)
    at Timeout.wrappedCallback (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:361:23)
    at Timeout.wrapped (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:184:28)
    at Timeout.wrappedCallback (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:452:66)
    at Timeout.wrapped (/home/chico/chico-container/node_modules/newrelic/lib/transaction/tracer/index.js:184:28)
    at ontimeout (timers.js:386:14)
    at tryOnTimeout (timers.js:250:5)
    at Timer.listOnTimeout (timers.js:214:5) 

I haven't had time to test @sushantdhiman

euqen commented 7 years ago

@sushantdhiman I've tested it locally. So when I started using evict in sequelize this issue was never reproduced anymore. So can you please merge sequelize/sequelize#7924 ASAP otherwise we will need to downgrade sequelize version until new version will be released.

cc @joe-angell

leebenson commented 6 years ago

I'm running into this error with a Google Cloud SQL instance (2nd gen). Knex and typeorm (both with/without a pool config) yield the same error after 15 mins:

This socket has been ended by the other party

Re-invoking the function that performs the SQL request a couple of times fixes it, so per above it looks like a dead connection is removed on the 3rd try (edit: 3rd try for knex, 2nd try for typeorm...)

What's the current advice to work around this? Implement custom retry logic? Are there any known ORMs that already do this- Sequelize, perhaps?

joe-angell commented 6 years ago

The custom validate function that does a select 1 seems to be pretty reliable at detecting the bad connections. We've had this in production for a while.

On Fri, Dec 1, 2017 at 4:06 AM, Lee Benson notifications@github.com wrote:

I'm running into this error with a Google Cloud SQL instance (2nd gen). Knex and typeorm (both with/without a pool config) yield the same error after 15 mins:

This socket has been ended by the other party

Re-invoking the function that performs the SQL request a couple of times fixes it, so per above it looks like a dead connection is removed on the 3rd try.

What's the current advice to work around this? Implement custom retry logic? Are there any known ORMs that already do this- Sequelize, perhaps?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sidorares/node-mysql2/issues/447#issuecomment-348478702, or mute the thread https://github.com/notifications/unsubscribe-auth/AHveyXAmt0KdGKlkMLtZWN2NTGe8_taVks5s7-vGgaJpZM4KmT6U .

--

Joe Angell cell: (720) 260-2190

leebenson commented 6 years ago

@joe-angell, what about your Aug 3 comment?

turns out the validate() is not sufficient, after a few days of uptime we start getting timeout exceptions:

How did you work around that?

joe-angell commented 6 years ago

I think that was caused by a bug in our validate function, once in a while the connection object is null and we were throwing an uncaught exception in the validate function... just remember to catch all errors and return false, :).

On Fri, Dec 1, 2017 at 8:41 AM, Lee Benson notifications@github.com wrote:

@joe-angell https://github.com/joe-angell, what about your Aug 3 comment?

turns out the validate() is not sufficient, after a few days of uptime we start getting timeout exceptions:

How did you work around that?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sidorares/node-mysql2/issues/447#issuecomment-348544220, or mute the thread https://github.com/notifications/unsubscribe-auth/AHveydNSbWzp0QLd7HlyPSC1VRYE62CSks5s8CwhgaJpZM4KmT6U .

--

Joe Angell cell: (720) 260-2190

leebenson commented 6 years ago

Thanks @joe-angell. Would you mind sharing the code you're using for validate please?

Also interested to know whether it's hitting the DB with a select 1 before every SQL statement. Surely not, right?

joe-angell commented 6 years ago

I'm afraid so, validate runs before every query afaik. Our application cannot tolerate any database connection issues... but yeah here you go (uses bluebird promises) and is a bit ugly.

    validate: (connection) => {
      try {
        const queryopts = {
          sql: 'select 1',
          timeout: 500
        };

        return Promise.fromCallback(connection.query.bind(connection, queryopts), { multiArgs: true })
          .spread((results) => {
            return _.get(results, '[0][1]') === 1;
          }).catch(() => {
            return false;
          });
      } catch (e) {
        return Promise.resolve(false);
      }
    }
leebenson commented 6 years ago

Thanks for sharing @joe-angell, appreciated.

I feel like I must be missing something obvious, though. Surely mysql2 isn't so fragile that we need to preclude every SQL query with a sentinel query, just to determine if the connection is still alive?

That's going to eat significant network / IO to make and await a separate network call before sending down the 'real' query, not to mention double the query volume on the DB host. I'd argue that's also the exact opposite of what the majority of developers would expect a pool to do for us-- i.e. eliminate dead connections, re-use good ones, and spark up new ones when the previous connection fails. This feels like an out-the-box expectation rather than a userland exercise.

I get @sidorares's previous comment that retry logic could be handled in numerous ways, so might be deemed a userland activity. But an error like This socket has been ended by the other party, which throws only if a timeout occurs at the other end and/or a connection has dropped, is far lower-level than I think most ORM users expect to design for. In 99% of cases, the desired behaviour is going to be to re-connect and try again. Shouldn't mysql2 be doing that for us?

Unless I'm missing something, this is a fundamental flaw that affects pretty much every major ORM available for Node: Knex (and therefore Bookshelf), TypeORM, Sequelize and probably others, all use mysql2. I deployed a simple Node app to Google Container Engine using Google Cloud SQL as the DB; this error threw for every 10-15 min period of idle traffic that was subsequently followed by a later SQL attempt (Cloud SQL's default idle timeout.)

Sequelize seems to have the greatest array of options for managing idle timeouts and various pool settings, that will probably fix the default timeout eviction. But the next time there's a network drop between my instance and the Cloud SQL host, or some other non-explicit disconnect, it's likely to surface again... and the default implementation is just to throw an ugly underlying Socket error. It feels unfinished, IMO.

I guess a way to fix would be to wrap each call in a retry function that can catch errors, detect if it's just a connection issue, and then try throwing the error back... but that's a pretty verbose solution for something I expected was the entire reason for pooling.

Am I missing something obvious, or is MySQL connectivity in every major ORM for Node really that fragile? How are others solving this?

joe-angell commented 6 years ago

We did not have this issue on sequelize 3, just fyi.

On Fri, Dec 1, 2017 at 1:59 PM, Lee Benson notifications@github.com wrote:

Thanks for sharing @joe-angell https://github.com/joe-angell, appreciated.

I feel like I must be missing something obvious, though. Surely mysql2 isn't so fragile that we need to preclude every SQL query with a sentinel query, just to determine if the connection if still alive?

That's going to eat significant network / IO to make and await a separate network call before sending down the 'real' query, not to mention double the query volume on the DB host. I'd argue that's also the exact opposite of what the majority of developers would expect a pool to do for us-- i.e. eliminate dead connections, re-use good ones, and spark up new ones when the previous connection fails. This feels like an out-the-box expectation rather than a userland exercise.

I get @sidorares https://github.com/sidorares's previous comment that retry logic could be handled in numerous ways, so might be deemed a userland activity. But an error like This socket has been ended by the other party, which throws only if a timeout occurs at the other end and/or a connection has dropped, is far lower-level than I think most ORM users expect to design for. In 99% of cases, the desired behaviour is going to be to re-connect and try again. Shouldn't mysql2 be doing that for us?

Unless I'm missing something, this is a fundamental flaw that affects pretty much every major ORM available for Node: Knex (and therefore Bookshelf), TypeORM, Sequelize and probably others, all use mysql2. I deployed a simple Node app to Google Container Engine using Google Cloud SQL as the DB; this error killed a SQL attempt every 10-15 mins of idle traffic (Cloud SQL's default idle timeout.)

Sequelize seems to have the greatest array of options for managing idle timeouts and various pool settings, that will probably fix the default timeout eviction. But the next time there's a network drop between my instance and the Cloud SQL host, or some other non-explicit disconnect, it's likely to surface again... and the default implementation is just to throw an ugly underlying Socket error. It feels unfinished.

I guess a way to fix would be to wrap each call in a retry function that can catch errors, detect if it's just a connection issue, and then try throwing the error back... but that's a pretty verbose solution for something I expected was the entire reason for pooling.

Am I missing something obvious, or is MySQL connectivity in every major ORM for Node really that fragile?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sidorares/node-mysql2/issues/447#issuecomment-348625502, or mute the thread https://github.com/notifications/unsubscribe-auth/AHveyUZeIZRTfrb4v_UyBa8Vst6WCCDzks5s8HbOgaJpZM4KmT6U .

--

Joe Angell cell: (720) 260-2190

leebenson commented 6 years ago

sequelize v3 uses "mysql": "~2.10.1"; v4 = "mysql2": "^1.5.1".

Is this something mysql2 introduced?

coreyjv commented 6 years ago

@leebenson --

I share you're same sentiments as we're seeing similar behavior as well. In other applications that I've developed that used connection pools that were typically Java-based had a (like HikariCP does for older JDBC drivers -- JDBC4+ have a Connection.isValid() method) test on borrow capability where a query like @joe-angell is running above is executed prior to handing out a connection from the pool.

@sidorares -- Could a 'test on borrow' like capability be implemented?

kdschaffe commented 6 years ago

I'm also running into this issue using 1.5.3 and 'mysql2/promise'. The error shows up consistently every morning after the connections in the pool are sitting idle for a while. After it errors, if I repeatedly attempt to get a new connection, it seems to errors on each stale connection, then removes the connection from the pool. And once all connections are gone, fresh connections start to work again. This is just an assumption. The only remedy is destroy connections, but this is obviously inefficient.

I'm releasing connections with connection.connection.release().

pool.releaseConnection(connection) and connection.release() don't seem to work. This seems off?

I'm not sure if it's related but I'm also seeing 'warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.' warnings.

I'm adding error handlers with connection.on('error', connectionErrorHandler); and removing with connection.removeListener('error', connectionErrorHandler);

fintecturegit commented 6 years ago

same issue as @kdschaffe but on "mysql2": "^1.6.1"

bnbon commented 6 years ago

Hi,

I get this issue when my react-native App resumes from the background and connects to a Hapi server - how should I handle this in the meantime - I use a connectionPool...

tushard-iprogrammercom commented 2 years ago

Does this fix also address issues in connection pools? I am on version 1.7.0 and getting this issue while using connection pools. If you install 1.7.0 and look at line number 13 in lib/pool_connection.js, the code is still using this code:

    this.once('end', () => {
      this._removeFromPool();
    });
    this.once('error', () => {
      this._removeFromPool();
    });

Please confirm.

sidorares commented 2 years ago

@tushard-iprogrammer could you check this change is part of the latest published version? Are you able to upgrade?

abentpole commented 2 years ago

@tushard-iprogrammer @sidorares the code is the same as of 2.3.3, though it's not quite clear to me what that would change.

PoolConnection is only listening for 'end' and 'error' events to boot itself from the pool.

But since it extends and runs Connection's constructor, the underlying Connection will theoretically have its 'close' fired. This should fire _notifyError which should hopefully emit an 'error' event, which should trigger PoolConnection's handler and remove itself from the pool.

I'm unable to reliably trigger an EPIPE to test this locally.