mysqljs / mysql

A pure node.js JavaScript Client implementing the MySQL protocol.
MIT License
18.31k stars 2.53k forks source link

PoolCluster stops queuing and fails with ETIMEDOUT #904

Open unlucio opened 10 years ago

unlucio commented 10 years ago

Hello everybody, I'm still investigating on this headache but I'd greatly appreciate whatever suggestion.

I'm using a PoolCluster with a 10 connection limit per pool, selecting a pool and running N queries on it. I'm inserting a sample of 300k rows, at 1k rows per query (using the multiple queries feature). I already checked the generated SQL and it is fine. Everything works fine as long as I try to send 30 (exactly 30) subsequent queries (30k rows): 10 connections get pulled out of the pool, the other get queue and performed as soon as a connection is available. And up to this point everything is just great :)

Running a test with 31 connection (30500 rows): everything goes wreck :/ the whole cluster tumbles down as the 1st getConnection() receives a ETIMEDOUT error and all the subsequent getConnection() call receive a Error: Pool is closed error. Here you can find a sample error log with stack traces. I added a console.log() (and marked it as mysql lib stack: Error:) in the library as suggested by this answer (the case did look kind of similar to mine)

Increasing config variables values such as connectionLimit, acquireTimeout, removeNodeErrorCount and explicitally setting waitForConnections, canRetry to true didn't help either.

Checking the server's configuration, the numbers seem fine:

mysql> SHOW VARIABLES LIKE "%connect%";
+-----------------------------------------------+--------------------------------------------+
| Variable_name                                 | Value                                      |
+-----------------------------------------------+--------------------------------------------+
| character_set_connection                      | utf8                                       |
| collation_connection                          | utf8_general_ci                            |
| connect_timeout                               | 10                                         |
| disconnect_on_expired_password                | ON                                         |
| init_connect                                  | SET collation_connection = utf8_general_ci |
| max_connect_errors                            | 100                                        |
| max_connections                               | 151                                        |
| max_user_connections                          | 0                                          |
| performance_schema_session_connect_attrs_size | 512                                        |
+-----------------------------------------------+--------------------------------------------+
9 rows in set (0.00 sec)

Shouldn't the pool/poolCluster feature manage these kind of situations by its self? What would be the best way to manage this kind of error and reconnect the lost pool?

As said I'm still investigating this situation and I'll try to add more information as soon as they come out. I'm sorry I'm not able to attach a fiddle test case at this time.

dougwilson commented 10 years ago

Interesting. So looking at the logs real quick, I'm wondering if you could do the following things and post back a new log for me?

require('blocked')(function(ms){console.log('blocked for %sms', ms)})
unlucio commented 10 years ago

I'm running node 0.10.30 and here are the logs :) thanks for help!!

dougwilson commented 10 years ago

cool. can you tell me around what function does debug: mysql storage: getting connection to pool: zeus print from? it looks like there is something that is halting your event loop for 32 seconds, so it seems like connectTimeout would have to be at least 40000 just for your code to function and not get connect timeouts. would it be possible to boil your code down to something just big enough to reproduce (and ideally without bluebird)?

unlucio commented 10 years ago

That log line happens just before poolCluster.getConnection(); The event loop is busy for 32 (or more) before arriving at that point, since it's transforming all the data.

Mumble, I saw that [2014-09-07T16:35:39.892Z] blocked for 32089ms but I wonder if it's just a mess up between wiston and console.log() The actual flow is:

And I'm trying to boil it down to something I can fiddle you here, but I'm still trying to wrap my mind on the parts that might really cause the problem :P

dougwilson commented 10 years ago

OK. Yea, if the log line is right above poolCluster.getConnection(), then the event loop blocking is just logging later (since it's triggered by a timer, this is normal; i can see the gap in the timestamps, which is why i was wondering were that log line was located). since the getting connection log lines are within a couple seconds of the timeout, then they should be together (assuming those timeouts actually correspond to the getting connection lines--you would have to use a global id counter and print that to make the connection between the two--thanks async :) ).

anyway, if those logs are indeed corresponding to the timeouts, then it's weird here: the first getting connection line was at 2014-09-07T16:35:36.566Z and the first timeout was at 2014-09-07T16:35:39.905Z, which is only 3.5 seconds later. the default timeout for this library is 10 seconds (10000). are you adjusting any timeout values in your code? if you are, it's important to note all timeouts are in milliseconds, not seconds, so they should all be in the thousands.

unlucio commented 10 years ago

this is where those logs happen:

return new Promise(function(resolve, reject) {
    logger.debug('mysql storage: getting connection to pool: ' + targetPool);
    poolCluster.getConnection(targetPool, function(err, connection) {
      logger.debug('mysql storage: got connection to pool: ' + targetPool);

      if (err) {
        logger.error('mysql storage: Cannot get required connection on mysql server for target pool: ' + targetPool + ' \n Error: ' + JSON.stringify(err));
        console.log('err: ', err);
        return;
      }

and YES! That's another thing I'm noticing since this morning. It totally seems to timeout before the 10 secs. As said I already tried to increase the timeout params but nothing changed. Literally, the calls still fail at the same speed. (yup, I'm using milliseconds correctly ;P)

I just pushed a commit just for the sake of trying increasing the timeouts up to 1minute (connectTimeout: 60000, acquireTimeout: 60000). I'll give it a spin asap the docker container is ready.

unlucio commented 10 years ago

and even encasing the timeouts to 60seconds the logs are still the same. Mumble... Rx and bluebird might be adding something on top, in terms of blocking the eventloop here and there, but I still wonder what's preventing the lib from respecting the given timeouts. And most of all: why once a getConnection() fails all the pool goes wreck as well, and nothing retries :(

dougwilson commented 10 years ago

why once a getConnection() fails all the pool goes wreck as well, and nothing retries :(

yes, that is probably a bug, certainly. i'm mainly trying to figure out why you're getting the timeouts. can you add this to your code and see what the output logs? also, can you list what version of this module you are using (i've been assuming you are using 2.4.3, but i can't be sure since it hasn't been stated yet)?

var Socket = require('net').Socket;
var _set = Socket.prototype.setTimeout;
Socket.prototype.setTimeout = function(msecs, callback) {
  console.trace('setting timeout of ' + msecs);
  _set.call(this, msecs, callback);
};
unlucio commented 10 years ago

I'm actually running version 2.4.2

dougwilson commented 10 years ago

I'm actually running version 2.4.2

ok, that's fine. the only difference doesn't apply here at all :) let me know if you can get that new sniplet's output, though :)

unlucio commented 10 years ago

cool, I'll update it anyway. My times zone is getting at the end right now, I'll add it tomorrow 1st thing in the morning. Also, if you suspect a bug, I'll clone the repo and take a deeper look into the library, perhaps I might come up with a patch if the problem is not me or how I use it :)

big thanks for the help, greatly appreciated.

dougwilson commented 10 years ago

no problem. i was wondering when you were going to head off :) the weird poolcluster behavior is very likely a bug, because you wouldn't think it would spew a bunch of pool ended errors, so feel free if i don't get to it first :) the timeout thing, though, certainly weird.

unlucio commented 10 years ago

So, good morning I guess from the other side of the planet :) I updated the lib to 2.5.0 (and cloned it), added the trace code and here are the logs

what def stands out to me is this

[...]
2014-09-08T05:28:10.910Z - debug: mysql storage: getting connection to pool: zeus
[2014-09-08T05:28:10.910Z] blocked for 11083ms
[2014-09-08T05:28:10.911Z] Trace: setting timeout of 0

setting timeout to 0ms asap it finishes subscribing all the getconnections()

dougwilson commented 10 years ago

yea, but i think we may have ruled out the event loop blocking, because looking at this new log, it looks just like the others in that the blocking was actually from before the get connections were even called. it also looks like all the timeouts are getting set correctly within node, which is what that snipplet showed me.

would you be able to do like a global counter variable and then reference that in the getting connection and the error logs? example:

return new Promise(function(resolve, reject) {
    var id = (global.connid = (global.connid || 0) + 1)
    logger.debug('mysql storage: getting connection ' + id + ' to pool: ' + targetPool);
    poolCluster.getConnection(targetPool, function(err, connection) {
      logger.debug('mysql storage: got connection ' + id + ' to pool: ' + targetPool);

      if (err) {
        logger.error('mysql storage: Cannot get required connection ' + id + ' on mysql server for target pool: ' + targetPool + ' \n Error: ' + JSON.stringify(err));
        console.log('err: ', err);
        return;
      }
unlucio commented 10 years ago

sure, here it is

unlucio commented 10 years ago

Adding some logs here and there in the lib:

 console.log('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options.timeout: ', options.timeout);
  console.log('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this.config.connectTimeout: ', this.config.connectTimeout);
  if (!callback && typeof options === 'function') {
    callback = options;
    options = {};
  }
      console.log('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POOL GOT CONNECTION');
      this.emit('connection', connection);
      return cb(null, connection);
    }.bind(this));
var cb;
  console.log('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RELEASING POOOL CONNECTION <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<');
  if (connection._pool) {
    if (connection._pool !== this) {
      throw new Error('Connection released to wrong pool');
    }

I'm getting some runs to actually work (totally randomly), here the logs. It's like slowing things down a little actually give the lib the opportunity to connect.

reading the code, this point:

if (this.config.connectTimeout) {
      var handleConnectTimeout = this._handleConnectTimeout.bind(this);

      this._socket.setTimeout(this.config.connectTimeout, handleConnectTimeout);
      this._socket.once('connect', function() {
        this.setTimeout(0, handleConnectTimeout);
      });
    }

makes me wonder if who is not respecting the timeout might actualyl be lower than the lib its self.

I'm not yet able to fine the "retry upon connection error" code though.

unlucio commented 10 years ago

While haunting for who's causing this to be called while trying to get a pool's connection, I stumbled on: https://github.com/felixge/node-mysql/blob/master/lib/Connection.js#L104 Just wondering what's the reason for it. Anyway I tried commenting it out, but it didn't help :(

dougwilson commented 10 years ago

Just wondering what's the reason for it.

this.setTimeout(0, handleConnectTimeout); is the wonderful API node.js provides to remove a timeout. The code there sets a timeout on the socket and then removes that timeout once the socket is connected.

dougwilson commented 10 years ago

It's like slowing things down a little actually give the lib the opportunity to connect.

Because console.log is implemented to actually flush out to the fd on most platforms in node.js, adding it all over will cause the event loop to have a chance to perform I/O. If adding a bunch of them helps, then it really does seem like this issue is that something is blocking the event loop for all that time, I'm not sure. It's hard to know without something I can dig into at my end :)

unlucio commented 10 years ago

Out of curiosity I tried with another driver: https://github.com/sidorares/node-mysql2 It implements the same API as node-mysql so all I had to change was the require('mysql') to require('mysql2') and everything worked :/

for the setTimeout: right, my bad. It's actually consistent with the hugly javascript setTimeout :P

dougwilson commented 10 years ago

Yea, but mysql2 does not implement timeouts of any kind :) So if your issue is something timeout-related, then it would not have the issue, haha.

unlucio commented 10 years ago

Yup, I know. It is a temporary solution but it's allowing us to move a little bit forward as we keep trying to figure out what;s wrong while using node-mysql :)

dougwilson commented 10 years ago

OK, that's no problem :) I just wanted to point it out, haha. I would like us to solve this still if you're interested is why, in case you were just going to change to mysql2 and leave this module all together :( :)

matthew-dean-hp commented 9 years ago

I think I'm experiencing the same issue as this but with a slightly different stack trace. The error I get is:

   [ 'Error: connect ETIMEDOUT', 
     ' at PoolConnection.Connection._handleConnectTimeout (/node_modules/mysql/lib/Connection.js:331:13)', 
     ' at Socket.g (events.js:180:16)', 
     ' at Socket.EventEmitter.emit (events.js:92:17)', 
     ' at Socket._onTimeout (net.js:326:8)', 
     ' at Timer.unrefTimeout [as ontimeout] (timers.js:418:13)', 
     ' --------------------', 
     ' at Protocol._enqueue (/node_modules/mysql/lib/protocol/Protocol.js:135:48)', 
     ' at Protocol.handshake (/node_modules/mysql/lib/protocol/Protocol.js:52:41)', 
     ' at PoolConnection.connect (/node_modules/mysql/lib/Connection.js:108:18)', 
     ' at Pool.getConnection (/node_modules/mysql/lib/Pool.js:42:23)', 
     ' at Pool.releaseConnection (/node_modules/mysql/lib/Pool.js:123:10)', 
     ' at Pool._removeConnection (/node_modules/mysql/lib/Pool.js:220:8)', 
     ' at PoolConnection._removeFromPool (/node_modules/mysql/lib/PoolConnection.js:61:8)', 
     ' at PoolConnection.EventEmitter.emit (events.js:95:17)', 
     ' at PoolConnection.Connection._handleProtocolEnd (/node_modules/mysql/lib/Connection.js:363:8)', 
     ' at Protocol.EventEmitter.emit (events.js:117:20)', 
     ' at Protocol._delegateError (/node_modules/mysql/lib/protocol/Protocol.js:397:10)', 
     ' at Handshake.sequence.on.on.on.on.on.self._connection._startTLS.err.code (/node_modules/mysql/lib/protocol/Protocol.js:159:12)', 
     ' at Handshake.EventEmitter.emit (events.js:92:17)', 
     ' at Handshake._onTimeout (/node_modules/mysql/lib/protocol/sequences/Sequence.js:116:8)', 
     ' at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)' ] } 

Unfortunately I have not been able to isolate exactly when this happens, but it would be great to hear if you've made any progress. I'm using version 2.4.3

dougwilson commented 9 years ago

@matthew-dean-hp are you using the PoolCluster?

unlucio commented 9 years ago

@matthew-dean-hp most of my problems where due to some unexpected big objects causing locks around. We were able to work around most of them and going back using this lib though we still have errors sporadically but they're usually:

{"code":"PROTOCOL_SEQUENCE_TIMEOUT","fatal":true,"timeout":10000}

on heavy machine load.

matthew-dean-hp commented 9 years ago

@dougwilson yes we are using a PoolCluster. Do you know if there's a way I can 'catch' this error and act accordingly?

DasithKuruppu commented 9 years ago

Im stuck on this same issue too , i tried on linux as well as windows, it seems to give the same error and it used to run just fine a while back it hangs up when connecting to my 2nd database, are u guys on a solution to this yet ?

dougwilson commented 9 years ago

@DasithKuruppu I haven't been able to reproduce and so no work has been done. If someone can suggest a fix who can reproduce and verify it, that would be great :)

DasithKuruppu commented 9 years ago
module.exports = function (mysql) {
var config=require('./Config.js');

var sourcepool = mysql.createPool(config.databases.source);
var destinationpool=mysql.createPool(config.databases.destination);

// Create 2 mysql pools to interact with 2 databases in 2 different locations
sourcepool.getConnection(function(err,sourceDB_Connection){
    console.log("connected to sourceDB");
 if(err) { console.log(err); return; }
    destinationpool.getConnection(function(err,destinationDB_Connection){
        console.log("connected to Destinationdb");
        if(err) { console.log(err); return; }
});
});

Here basically im using express (latest version) and node-mysql(latest version) it returns destination DB as undefined , the two pool connections are to two databases hosted online(and yea i can connect to both dbs using phpmyadmin) so both databases are up , funney thing is this same code ran fine for the last couple of weeks and today suddenly it hangs with this error.

http://prntscr.com/6whqom

hope this helps in figuring it out

dougwilson commented 9 years ago

@DasithKuruppu no, it doesn't help. I've already tried similar code. The issue is environmental, which is why I'm not able to reproduce since I'm not able to reproduce your environment.

DasithKuruppu commented 9 years ago

Yea which is why i tried the same code on my linux(elementary freya) with a fresh node installation, seemed to reproduce the same error ?

dougwilson commented 9 years ago

I assume you're still connecting to the same MySQL host, correct?

DasithKuruppu commented 9 years ago

yes

DasithKuruppu commented 9 years ago

I could give you access to my two Db's hosted online if you want , could you give me an email or something i can mail u the logins

dougwilson commented 9 years ago

Ok. That's still your own environmental. All you did was change the client host, but you still kept those variables the same. My email is listed on my GitHub profile.

DasithKuruppu commented 9 years ago

Just mailed u the details , hope that helps u figure it out

dougwilson commented 9 years ago

Thank you much :) I got the email and I'll look at it today and hopefully moves this ticket forward :)!

DasithKuruppu commented 9 years ago

Thanks a lot , and hope you keep me updated :+1:

bnbon commented 9 years ago

Was there a solution found to this, this is the same issue I have...

dougwilson commented 9 years ago

Hi @varmint09 , unfortunately not. I need help trying to reproduce the issue or if someone can debug the issue when they have it and either offer up a PR or a detailed description for someone else (like myself) to create a fix.

bnbon commented 9 years ago

Hi @dougwilson looks to be quite a common issue, looks like the last two issues posted are similar. I have tried using mysql2, but that didnt work either, and im currently in a situation where a site cant go live as amazon RDS cant be connected to...

Not quite sure how to proceed or help really, as this is critical for me!

sidorares commented 9 years ago

hi @varmint09

also, it might be helpful to see logs with connection debug flag set to true

bnbon commented 9 years ago

These are very modest queries indeed, but no query gets to run as the database connection is not established correctly.

If I decided not to use connection pools would that make any difference? Anything else I can try?

sidorares commented 9 years ago

have you tried using one direct connection @varmint09 ?

bnbon commented 9 years ago

Just tried, no luck. @sidorares

sidorares commented 9 years ago

and standard cli client works fine from that machine?

unlucio commented 9 years ago

@varmint09 we stubbornly tried for a year to use node-mysql experiencing timeouts, protocol_sequence_errors, "left over connections" showing as pending on the server for hours and other fatal errors and high memory consumption. During the last year we investigated many times a number of problems and odd behaviours and general instability that seems to emerge when working with tenth of thousands of queries like the following (example)

mysql.format('INSERT INTO ' + database + ' (' + fields + ') VALUES ? ON DUPLICATE KEY UPDATE ' + queryOptions.updateString, item);

// items of about 40 properties

or when running queries generating results of hundreds of thousands or million records of ~40, ~50 properties (using the stream interface). Using both pools (10 connections) or a single persistent connection always experiencing problems. I really wish in this year we were able to come up with a consistent example to investigate the issues but unfortunately they were quite difficult to reproduce in a concise way if not running the whole system and pushing it to it's live workload on our dev machines as well. We switched for good about a month ago to mysql2 and every thing now works fine and the memory consumption lowered of about 40% as well (and our code is the same, we really just switched the "driver").

Just a personal note: I would have rather sticked with node-mysql since it seems to be the most used one and I'm not a fan of dispersing people and energy of OS projects that kind of make the same thing. I wish I could make a contribution on this one and solve our problems at the same time but unfortunately I wasn't able to wrap my mind around the project's code and structure and and I don't have a in-depth enough knowledge of mysql to be able to "sense" what's going on. The only thing I can really do is report my experience and maybe try to see if there are any chance to show a live system experiencing those issues to someone that might be able to understand more.

bnbon commented 9 years ago

@unlucio My understanding is that you are also using ELB, and connecting to a MySQL RDS instance, I will try using MySQL2, and I just wanted to check if that worked for you?

My config object I am passing into createPool is

{
      "host" : process.env.RDS_HOSTNAME,
      "port" : +process.env.RDS_PORT,
      "user" : process.env.RDS_USERNAME,
      "password" : process.env.RDS_PASSWORD,
      "database" : process.env.RDS_DATABASE_NAME,
      "ssl" : "Amazon RDS"
}

I will try again using mysql2 in the interim, and hopefully that will work for me too.

@dougwilson seems like quite a big bug to not mention somewhere more prominently, as from a glance through issues this looks quite common...

dougwilson commented 8 years ago

Hi @varmint09 , there still isn't anything we can do. My comment at https://github.com/felixge/node-mysql/issues/904#issuecomment-140587694 has still be unanswered. Can you help with that? We need detailed instructions on how to reproduce the issue or there is no hope of us being able to fix anything (how can we fix something we cannot see?). Another option is to debug your issue, determine the solution, and propose a fix in a PR.