Open unlucio opened 10 years ago
Interesting. So looking at the logs real quick, I'm wondering if you could do the following things and post back a new log for me?
console.log('node ' + process.version)
to your program to see the node versionrequire('log-timestamp')
will alter the globals to do this for you, which will work here.require
statements):require('blocked')(function(ms){console.log('blocked for %sms', ms)})
cool. can you tell me around what function does debug: mysql storage: getting connection to pool: zeus
print from? it looks like there is something that is halting your event loop for 32 seconds, so it seems like connectTimeout
would have to be at least 40000 just for your code to function and not get connect timeouts. would it be possible to boil your code down to something just big enough to reproduce (and ideally without bluebird)?
That log line happens just before poolCluster.getConnection();
The event loop is busy for 32 (or more) before arriving at that point, since it's transforming all the data.
Mumble, I saw that [2014-09-07T16:35:39.892Z] blocked for 32089ms
but I wonder if it's just a mess up between wiston
and console.log()
The actual flow is:
mysql.format('replace [...] ?', {data})
getConnection()
everything dies)And I'm trying to boil it down to something I can fiddle you here, but I'm still trying to wrap my mind on the parts that might really cause the problem :P
OK. Yea, if the log line is right above poolCluster.getConnection()
, then the event loop blocking is just logging later (since it's triggered by a timer, this is normal; i can see the gap in the timestamps, which is why i was wondering were that log line was located). since the getting connection
log lines are within a couple seconds of the timeout, then they should be together (assuming those timeouts actually correspond to the getting connection
lines--you would have to use a global id counter and print that to make the connection between the two--thanks async :) ).
anyway, if those logs are indeed corresponding to the timeouts, then it's weird here: the first getting connection line was at 2014-09-07T16:35:36.566Z and the first timeout was at 2014-09-07T16:35:39.905Z, which is only 3.5 seconds later. the default timeout for this library is 10 seconds (10000). are you adjusting any timeout values in your code? if you are, it's important to note all timeouts are in milliseconds, not seconds, so they should all be in the thousands.
this is where those logs happen:
return new Promise(function(resolve, reject) {
logger.debug('mysql storage: getting connection to pool: ' + targetPool);
poolCluster.getConnection(targetPool, function(err, connection) {
logger.debug('mysql storage: got connection to pool: ' + targetPool);
if (err) {
logger.error('mysql storage: Cannot get required connection on mysql server for target pool: ' + targetPool + ' \n Error: ' + JSON.stringify(err));
console.log('err: ', err);
return;
}
and YES! That's another thing I'm noticing since this morning. It totally seems to timeout before the 10 secs. As said I already tried to increase the timeout params but nothing changed. Literally, the calls still fail at the same speed. (yup, I'm using milliseconds correctly ;P)
I just pushed a commit just for the sake of trying increasing the timeouts up to 1minute
(connectTimeout: 60000
, acquireTimeout: 60000
). I'll give it a spin asap the docker container is ready.
and even encasing the timeouts to 60seconds the logs are still the same. Mumble... Rx and bluebird might be adding something on top, in terms of blocking the eventloop here and there, but I still wonder what's preventing the lib from respecting the given timeouts. And most of all: why once a getConnection() fails all the pool goes wreck as well, and nothing retries :(
why once a getConnection() fails all the pool goes wreck as well, and nothing retries :(
yes, that is probably a bug, certainly. i'm mainly trying to figure out why you're getting the timeouts. can you add this to your code and see what the output logs? also, can you list what version of this module you are using (i've been assuming you are using 2.4.3, but i can't be sure since it hasn't been stated yet)?
var Socket = require('net').Socket;
var _set = Socket.prototype.setTimeout;
Socket.prototype.setTimeout = function(msecs, callback) {
console.trace('setting timeout of ' + msecs);
_set.call(this, msecs, callback);
};
I'm actually running version 2.4.2
I'm actually running version 2.4.2
ok, that's fine. the only difference doesn't apply here at all :) let me know if you can get that new sniplet's output, though :)
cool, I'll update it anyway. My times zone is getting at the end right now, I'll add it tomorrow 1st thing in the morning. Also, if you suspect a bug, I'll clone the repo and take a deeper look into the library, perhaps I might come up with a patch if the problem is not me or how I use it :)
big thanks for the help, greatly appreciated.
no problem. i was wondering when you were going to head off :) the weird poolcluster behavior is very likely a bug, because you wouldn't think it would spew a bunch of pool ended errors, so feel free if i don't get to it first :) the timeout thing, though, certainly weird.
So, good morning I guess from the other side of the planet :) I updated the lib to 2.5.0 (and cloned it), added the trace code and here are the logs
what def stands out to me is this
[...]
2014-09-08T05:28:10.910Z - debug: mysql storage: getting connection to pool: zeus
[2014-09-08T05:28:10.910Z] blocked for 11083ms
[2014-09-08T05:28:10.911Z] Trace: setting timeout of 0
setting timeout to 0ms asap it finishes subscribing all the getconnections()
yea, but i think we may have ruled out the event loop blocking, because looking at this new log, it looks just like the others in that the blocking was actually from before the get connections were even called. it also looks like all the timeouts are getting set correctly within node, which is what that snipplet showed me.
would you be able to do like a global counter variable and then reference that in the getting connection and the error logs? example:
return new Promise(function(resolve, reject) {
var id = (global.connid = (global.connid || 0) + 1)
logger.debug('mysql storage: getting connection ' + id + ' to pool: ' + targetPool);
poolCluster.getConnection(targetPool, function(err, connection) {
logger.debug('mysql storage: got connection ' + id + ' to pool: ' + targetPool);
if (err) {
logger.error('mysql storage: Cannot get required connection ' + id + ' on mysql server for target pool: ' + targetPool + ' \n Error: ' + JSON.stringify(err));
console.log('err: ', err);
return;
}
Adding some logs here and there in the lib:
console.log('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options.timeout: ', options.timeout);
console.log('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this.config.connectTimeout: ', this.config.connectTimeout);
if (!callback && typeof options === 'function') {
callback = options;
options = {};
}
console.log('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POOL GOT CONNECTION');
this.emit('connection', connection);
return cb(null, connection);
}.bind(this));
var cb;
console.log('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RELEASING POOOL CONNECTION <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<');
if (connection._pool) {
if (connection._pool !== this) {
throw new Error('Connection released to wrong pool');
}
I'm getting some runs to actually work (totally randomly), here the logs. It's like slowing things down a little actually give the lib the opportunity to connect.
reading the code, this point:
if (this.config.connectTimeout) {
var handleConnectTimeout = this._handleConnectTimeout.bind(this);
this._socket.setTimeout(this.config.connectTimeout, handleConnectTimeout);
this._socket.once('connect', function() {
this.setTimeout(0, handleConnectTimeout);
});
}
makes me wonder if who is not respecting the timeout might actualyl be lower than the lib its self.
I'm not yet able to fine the "retry upon connection error" code though.
While haunting for who's causing this to be called while trying to get a pool's connection, I stumbled on: https://github.com/felixge/node-mysql/blob/master/lib/Connection.js#L104 Just wondering what's the reason for it. Anyway I tried commenting it out, but it didn't help :(
Just wondering what's the reason for it.
this.setTimeout(0, handleConnectTimeout);
is the wonderful API node.js provides to remove a timeout. The code there sets a timeout on the socket and then removes that timeout once the socket is connected.
It's like slowing things down a little actually give the lib the opportunity to connect.
Because console.log
is implemented to actually flush out to the fd on most platforms in node.js, adding it all over will cause the event loop to have a chance to perform I/O. If adding a bunch of them helps, then it really does seem like this issue is that something is blocking the event loop for all that time, I'm not sure. It's hard to know without something I can dig into at my end :)
Out of curiosity I tried with another driver:
https://github.com/sidorares/node-mysql2
It implements the same API as node-mysql so all I had to change was the require('mysql')
to require('mysql2')
and everything worked :/
for the setTimeout: right, my bad. It's actually consistent with the hugly javascript setTimeout :P
Yea, but mysql2
does not implement timeouts of any kind :) So if your issue is something timeout-related, then it would not have the issue, haha.
Yup, I know. It is a temporary solution but it's allowing us to move a little bit forward as we keep trying to figure out what;s wrong while using node-mysql :)
OK, that's no problem :) I just wanted to point it out, haha. I would like us to solve this still if you're interested is why, in case you were just going to change to mysql2 and leave this module all together :( :)
I think I'm experiencing the same issue as this but with a slightly different stack trace. The error I get is:
[ 'Error: connect ETIMEDOUT',
' at PoolConnection.Connection._handleConnectTimeout (/node_modules/mysql/lib/Connection.js:331:13)',
' at Socket.g (events.js:180:16)',
' at Socket.EventEmitter.emit (events.js:92:17)',
' at Socket._onTimeout (net.js:326:8)',
' at Timer.unrefTimeout [as ontimeout] (timers.js:418:13)',
' --------------------',
' at Protocol._enqueue (/node_modules/mysql/lib/protocol/Protocol.js:135:48)',
' at Protocol.handshake (/node_modules/mysql/lib/protocol/Protocol.js:52:41)',
' at PoolConnection.connect (/node_modules/mysql/lib/Connection.js:108:18)',
' at Pool.getConnection (/node_modules/mysql/lib/Pool.js:42:23)',
' at Pool.releaseConnection (/node_modules/mysql/lib/Pool.js:123:10)',
' at Pool._removeConnection (/node_modules/mysql/lib/Pool.js:220:8)',
' at PoolConnection._removeFromPool (/node_modules/mysql/lib/PoolConnection.js:61:8)',
' at PoolConnection.EventEmitter.emit (events.js:95:17)',
' at PoolConnection.Connection._handleProtocolEnd (/node_modules/mysql/lib/Connection.js:363:8)',
' at Protocol.EventEmitter.emit (events.js:117:20)',
' at Protocol._delegateError (/node_modules/mysql/lib/protocol/Protocol.js:397:10)',
' at Handshake.sequence.on.on.on.on.on.self._connection._startTLS.err.code (/node_modules/mysql/lib/protocol/Protocol.js:159:12)',
' at Handshake.EventEmitter.emit (events.js:92:17)',
' at Handshake._onTimeout (/node_modules/mysql/lib/protocol/sequences/Sequence.js:116:8)',
' at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)' ] }
Unfortunately I have not been able to isolate exactly when this happens, but it would be great to hear if you've made any progress. I'm using version 2.4.3
@matthew-dean-hp are you using the PoolCluster
?
@matthew-dean-hp most of my problems where due to some unexpected big objects causing locks around. We were able to work around most of them and going back using this lib though we still have errors sporadically but they're usually:
{"code":"PROTOCOL_SEQUENCE_TIMEOUT","fatal":true,"timeout":10000}
on heavy machine load.
@dougwilson yes we are using a PoolCluster. Do you know if there's a way I can 'catch' this error and act accordingly?
Im stuck on this same issue too , i tried on linux as well as windows, it seems to give the same error and it used to run just fine a while back it hangs up when connecting to my 2nd database, are u guys on a solution to this yet ?
@DasithKuruppu I haven't been able to reproduce and so no work has been done. If someone can suggest a fix who can reproduce and verify it, that would be great :)
module.exports = function (mysql) {
var config=require('./Config.js');
var sourcepool = mysql.createPool(config.databases.source);
var destinationpool=mysql.createPool(config.databases.destination);
// Create 2 mysql pools to interact with 2 databases in 2 different locations
sourcepool.getConnection(function(err,sourceDB_Connection){
console.log("connected to sourceDB");
if(err) { console.log(err); return; }
destinationpool.getConnection(function(err,destinationDB_Connection){
console.log("connected to Destinationdb");
if(err) { console.log(err); return; }
});
});
Here basically im using express (latest version) and node-mysql(latest version) it returns destination DB as undefined , the two pool connections are to two databases hosted online(and yea i can connect to both dbs using phpmyadmin) so both databases are up , funney thing is this same code ran fine for the last couple of weeks and today suddenly it hangs with this error.
hope this helps in figuring it out
@DasithKuruppu no, it doesn't help. I've already tried similar code. The issue is environmental, which is why I'm not able to reproduce since I'm not able to reproduce your environment.
Yea which is why i tried the same code on my linux(elementary freya) with a fresh node installation, seemed to reproduce the same error ?
I assume you're still connecting to the same MySQL host, correct?
yes
I could give you access to my two Db's hosted online if you want , could you give me an email or something i can mail u the logins
Ok. That's still your own environmental. All you did was change the client host, but you still kept those variables the same. My email is listed on my GitHub profile.
Just mailed u the details , hope that helps u figure it out
Thank you much :) I got the email and I'll look at it today and hopefully moves this ticket forward :)!
Thanks a lot , and hope you keep me updated :+1:
Was there a solution found to this, this is the same issue I have...
Hi @varmint09 , unfortunately not. I need help trying to reproduce the issue or if someone can debug the issue when they have it and either offer up a PR or a detailed description for someone else (like myself) to create a fix.
Hi @dougwilson looks to be quite a common issue, looks like the last two issues posted are similar. I have tried using mysql2, but that didnt work either, and im currently in a situation where a site cant go live as amazon RDS cant be connected to...
Not quite sure how to proceed or help really, as this is critical for me!
hi @varmint09
LOAD DATA LOCAL INFILE
using generated sql as inputalso, it might be helpful to see logs with connection debug
flag set to true
These are very modest queries indeed, but no query gets to run as the database connection is not established correctly.
If I decided not to use connection pools would that make any difference? Anything else I can try?
have you tried using one direct connection @varmint09 ?
Just tried, no luck. @sidorares
and standard cli client works fine from that machine?
@varmint09 we stubbornly tried for a year to use node-mysql experiencing timeouts, protocol_sequence_errors, "left over connections" showing as pending on the server for hours and other fatal errors and high memory consumption. During the last year we investigated many times a number of problems and odd behaviours and general instability that seems to emerge when working with tenth of thousands of queries like the following (example)
mysql.format('INSERT INTO ' + database + ' (' + fields + ') VALUES ? ON DUPLICATE KEY UPDATE ' + queryOptions.updateString, item);
// items of about 40 properties
or when running queries generating results of hundreds of thousands or million records of ~40, ~50 properties (using the stream interface). Using both pools (10 connections) or a single persistent connection always experiencing problems. I really wish in this year we were able to come up with a consistent example to investigate the issues but unfortunately they were quite difficult to reproduce in a concise way if not running the whole system and pushing it to it's live workload on our dev machines as well. We switched for good about a month ago to mysql2 and every thing now works fine and the memory consumption lowered of about 40% as well (and our code is the same, we really just switched the "driver").
Just a personal note: I would have rather sticked with node-mysql since it seems to be the most used one and I'm not a fan of dispersing people and energy of OS projects that kind of make the same thing. I wish I could make a contribution on this one and solve our problems at the same time but unfortunately I wasn't able to wrap my mind around the project's code and structure and and I don't have a in-depth enough knowledge of mysql to be able to "sense" what's going on. The only thing I can really do is report my experience and maybe try to see if there are any chance to show a live system experiencing those issues to someone that might be able to understand more.
@unlucio My understanding is that you are also using ELB, and connecting to a MySQL RDS instance, I will try using MySQL2, and I just wanted to check if that worked for you?
My config object I am passing into createPool is
{
"host" : process.env.RDS_HOSTNAME,
"port" : +process.env.RDS_PORT,
"user" : process.env.RDS_USERNAME,
"password" : process.env.RDS_PASSWORD,
"database" : process.env.RDS_DATABASE_NAME,
"ssl" : "Amazon RDS"
}
I will try again using mysql2 in the interim, and hopefully that will work for me too.
@dougwilson seems like quite a big bug to not mention somewhere more prominently, as from a glance through issues this looks quite common...
Hi @varmint09 , there still isn't anything we can do. My comment at https://github.com/felixge/node-mysql/issues/904#issuecomment-140587694 has still be unanswered. Can you help with that? We need detailed instructions on how to reproduce the issue or there is no hope of us being able to fix anything (how can we fix something we cannot see?). Another option is to debug your issue, determine the solution, and propose a fix in a PR.
Hello everybody, I'm still investigating on this headache but I'd greatly appreciate whatever suggestion.
I'm using a PoolCluster with a 10 connection limit per pool, selecting a pool and running N queries on it. I'm inserting a sample of 300k rows, at 1k rows per query (using the multiple queries feature). I already checked the generated SQL and it is fine. Everything works fine as long as I try to send 30 (exactly 30) subsequent queries (30k rows): 10 connections get pulled out of the pool, the other get queue and performed as soon as a connection is available. And up to this point everything is just great :)
Running a test with 31 connection (30500 rows): everything goes wreck :/ the whole cluster tumbles down as the 1st
getConnection()
receives aETIMEDOUT
error and all the subsequentgetConnection()
call receive aError: Pool is closed
error. Here you can find a sample error log with stack traces. I added aconsole.log()
(and marked it asmysql lib stack: Error:
) in the library as suggested by this answer (the case did look kind of similar to mine)Increasing config variables values such as
connectionLimit
,acquireTimeout
,removeNodeErrorCount
and explicitally settingwaitForConnections
,canRetry
to true didn't help either.Checking the server's configuration, the numbers seem fine:
Shouldn't the pool/poolCluster feature manage these kind of situations by its self? What would be the best way to manage this kind of error and reconnect the lost pool?
As said I'm still investigating this situation and I'll try to add more information as soon as they come out. I'm sorry I'm not able to attach a fiddle test case at this time.