Seeing the pool randomly being destroyed after timeout

tgriesser commented 9 years ago

Not sure what's causing it and haven't had time to completely reproduce it, but the pool seems to be randomly being destroyed after a connect ETIMEDOUT - only seeing this recently so suspect it's something with the latest patch version.

I'll see about being able to reproduce it, but any ideas there?

myndzi commented 9 years ago

When you say "the pool is being destroyed", do you mean that knex reports it as destroyed or pool2 reports it as destroyed?

One thing I've observed in general, even with the previous pooler, is that "general problems/misconfigurations with connecting" in knex tend to wind up being reported as pool errors; this makes it hard to distinguish things sometimes.

The only call in the entire file that could initiate destruction of the pool itself is here:

https://github.com/myndzi/pool2/blob/master/lib/pool.js#L410

I believe we talked a little bit about this one before: the purpose is to throw an error if the pool receives an error trying to acquire its first resource(s), so that invalid instantiation/configuration causes a crash error instead of silently getting swallowed in an infinite invalid reconnect loop.

Once it has successfully acquired at least one resource, it is assumed that the configuration is valid, and this behavior is disabled, controlled by pool.live being set to true. The only place that happens is in the constructor.

So, if pool2 is reporting a 'destroyed' error, it would have to be 'failed to ever acquire any resource', or something external is calling the _destroyPool method (.end should be preferred, though perhaps it should fall back to destroy or export this method directly if we don't care about or can't clean up nicely).

If knex is reporting a 'destroyed' error, well, you'd know more about that one than I :)

myndzi commented 9 years ago

I just got something like this while doing some work, is this what you were referring to?

Knex:Error Pool2 - Error: connect ECONNREFUSED Knex:Error Pool2 - Error: connect ECONNREFUSED Error: Pool was destroyed

tgriesser commented 9 years ago

Yep

myndzi commented 9 years ago

Yeah, that's basically "the settings were wrong and I couldn't connect, ever"

tgriesser commented 9 years ago

Oh wait sorry it was ETIMEDOUT not ECONNREFUSED. I'm traveling right now but will take a look when I get back tomorrow

myndzi commented 9 years ago

Same thing, different error. ETIMEDOUT would be "settings were wrong" or possibly "network connectivity bombed while this thing tried to start up"

myndzi commented 9 years ago

Any news on this? If it's not an initial time out it might also be related to #14.

dvp0 commented 9 years ago

I really hope @tgriesser or @myndzi can help me out with this.

PROBLEM I have a hapi/bookshelf/knex(everything latest) setup where all the settings are to default. Now when my rest api fails to connect to DB (RDS based postgres) it ETIMEDOUT s and destroys the pool after 1 minute. Second consecutive requests fails with "Pool is destroyed" error after 1 minute.

CONSOLE

hapi server started sdfd Knex:Error Pool2 - Error: Timed out acquiring resource Unhandled rejection Error: Pool was destroyed at Pool._destroyPool (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/pool2/lib/pool.js:487:16) at Pool. (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/pool2/lib/pool.js:410:18) at Pool. (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/pool2/lib/pool.js:421:13) at Timer.listOnTimeout as ontimeout Knex:Error Pool2 - Error: Timed out acquiring resource Debug: request, closed, error Knex:Error Pool2 - Error: connect ETIMEDOUT Knex:Error Pool2 - Error: connect ETIMEDOUT sdfd Unhandled rejection Error: Pool is destroyed at Pool.acquire (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/pool2/lib/pool.js:149:12) at /Users/hiral/projects/personal/rx-api/node_modules/knex/lib/client.js:204:19 at tryCatcher (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/bluebird/js/main/util.js:24:31) at Promise._resolveFromResolver (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/bluebird/js/main/promise.js:427:31) at new Promise (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/bluebird/js/main/promise.js:53:37) at Client.acquireConnection (/Users/hiral/projects/personal/rx-api/node_modules/knex/lib/client.js:200:12) at /Users/hiral/projects/personal/rx-api/node_modules/knex/lib/runner.js:138:49 at tryCatcher (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/bluebird/js/main/util.js:24:31) at Function.Promise.attempt.Promise.try (/Users/hiral/projects/personal/rx-api/node_modules/knex/node_modules/bluebird/js/main/method.js:31:24) at Runner.ensureConnection (/Users/hiral/projects/personal/rx-api/node_modules/knex/lib/runner.js:137:26) at Runner.run (/Users/hiral/projects/personal/rx-api/node_modules/knex/lib/runner.js:30:31) at QueryBuilder.Target.then (/Users/hiral/projects/personal/rx-api/node_modules/knex/lib/interface.js:27:43) at QueryBuilder.tryCatcher (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/util.js:24:31) at doThenable (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/thenables.js:52:38) at tryConvertToPromise (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/thenables.js:30:20) at Promise._resolveCallback (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:389:24) at Promise._settlePromiseFromHandler (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:462:17) at Promise._settlePromiseAt (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:530:18) at Promise._settlePromises (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:646:14) at Promise.b (domain.js:183:18) at Async._drainQueue (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:182:12) at Async._drainQueues (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:187:10) at Async.drainQueues (/Users/hiral/projects/personal/rx-api/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:15:14) at process._tickDomainCallback (node.js:463:13)

QUESTIONS

@myndzi Is there anyway I can recreate the pool (pretty newbie when comes to this stuff) whne destroyed so when the DB comes back up again it start acting normal? Any additional settings I need to set on knex/ pool2 config ?

@tgriesser Proper place to handle this exception in bookshelf to respond safely ?

myndzi commented 9 years ago

Can you clarify: does this happen when you start your application if the DB is inaccessible, or does this happen after your application has been running and working, but loses connection to the db? If the former, I have a patch to allow you to configure this behavior, I will probably just publish it shortly. I wanted to make sure I had a chance to catch tgriesser before pushing it out.

You can try installing the version of pool2 from master on this repository instead of the one installed with knex, and setting the 'bailAfter' option to something large enough to encompass the expected downtime.

acgourley commented 9 years ago

Just chiming in that I'm having this issue against mysql using bookshelf.

The application runs for a while, then a stack like this occurs:


 2015-08-27 00:49:09.000 Knex:Error Pool2 - Error: Pool.release(): Resource not member of pool
 2015-08-27 00:49:09.000 Error: read ETIMEDOUT
 2015-08-27 00:49:09.000 at exports._errnoException (util.js:746:11)
 2015-08-27 00:49:09.000 at TCP.onread (net.js:559:26)
 2015-08-27 00:49:09.000 --------------------
 2015-08-27 00:49:09.000 at Protocol._enqueue (/app/node_modules/mysql/lib/protocol/Protocol.js:135:48)
 2015-08-27 00:49:09.000 at Connection.query (/app/node_modules/mysql/lib/Connection.js:201:25)
 2015-08-27 00:49:09.000 at /app/node_modules/knex/lib/dialects/mysql/index.js:92:18
 2015-08-27 00:49:09.000 at tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)
 2015-08-27 00:49:09.000 at Promise._resolveFromResolver (/app/node_modules/bluebird/js/main/promise.js:476:31)
 2015-08-27 00:49:09.000 at new Promise (/app/node_modules/bluebird/js/main/promise.js:69:37)
 2015-08-27 00:49:09.000 at Client._query (/app/node_modules/knex/lib/dialects/mysql/index.js:88:12)
 2015-08-27 00:49:09.000 at Client.query (/app/node_modules/knex/lib/client.js:127:24)
 2015-08-27 00:49:09.000 at Runner.<anonymous> (/app/node_modules/knex/lib/runner.js:118:24)
 2015-08-27 00:49:09.000 at Runner.tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)
 2015-08-27 00:49:09.000 at Runner.query (/app/node_modules/bluebird/js/main/method.js:15:34)
 2015-08-27 00:49:09.000 at /app/node_modules/knex/lib/runner.js:44:21
 2015-08-27 00:49:09.000 at /app/node_modules/bluebird/js/main/using.js:165:30
 2015-08-27 00:49:09.000 at tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)
 2015-08-27 00:49:09.000 at Promise._settlePromiseFromHandler (/app/node_modules/bluebird/js/main/promise.js:503:31)
 2015-08-27 00:49:09.000 at Promise._settlePromiseAt (/app/node_modules/bluebird/js/main/promise.js:577:18)
 2015-08-27 00:49:09.000 at Promise._settlePromises (/app/node_modules/bluebird/js/main/promise.js:693:14)
 2015-08-27 00:49:09.000 at Async._drainQueue (/app/node_modules/bluebird/js/main/async.js:123:16)
 2015-08-27 00:49:09.000 at Async._drainQueues (/app/node_modules/bluebird/js/main/async.js:133:10)
 2015-08-27 00:49:09.000 at Immediate.Async.drainQueues [as _onImmediate] (/app/node_modules/bluebird/js/main/async.js:15:14)
 2015-08-27 00:49:09.000 at processImmediate [as _immediateCallback] (timers.js:367:17)

myndzi commented 9 years ago

What version of pool2 is currently installed? (npm ls pool2) Also, what version of knex?

myndzi commented 9 years ago

Ah, I think I see the problem. acquireRawConnection is returning a promise, however the error handler is attempting to destroy the connection object.

Alternately, it might be trying to release even after an error causes the resource to be destroyed. I can't see how that error would result from the connection error handler in knex, since it calls pool.destroy

dvp0 commented 9 years ago

@myndzi I have knex 0.8.6 and with that pool2 1.1.0. Sorry for not replying earlier, notification email was lost in thousand other from work github.

dvp0 commented 9 years ago

@myndzi Recently I have seen that one particular connection was refused mostly only first time after long rest. I rerun it and it is fine. After couple of hours/days happens again. I compared the actual query knex was executing (through debug mode) and it is exact similar to consecutive success query. I am still debugging and it could be knex issue too, but I thought it might help.

myndzi commented 9 years ago

Refused the first time after a long rest = a timeout; pool2 was designed to throw an error if the first attempted connection fails. You could update to 1.2.0 to control some of this behavior if you desire, but there shouldn't be anything that changes functionally. This error is probably safe to ignore, unless knex is destroying the pool because of it, which it should not do. I'll try and reproduce a timed out query using knex and see if I can produce this problem and run it down.

acgourley commented 9 years ago

My error was with pool2 1.1.6 and knex 0.8.6

Any update on this issue? Thanks for all your work here both @myndzi and @tgriesser

myndzi commented 9 years ago

Apologies, I've gotten quite busy at work and didn't do what I said I would do to look into this. I would appreciate it if you could tell me if it appears to be a cosmetic problem (reporting an error but not affecting your application) or an actual functional problem, because I think the chance is strong it's the former. If it's the latter, I'll make it a priority to address it.

The unfortunate nature of this kind of error is that, since pool2 is involved at such a core level, any connection "stuff" that happens is going to have pool2's "name" attached to it -- but in a number of cases, these are situations pool2 is built to deal with; the message may simply be getting passed along when it would be better ignored.

acgourley commented 9 years ago

Thanks for the reply - in my case it is serious because it crashes the node instance and I cannot figure out how to catch the exception in my code.

If what you are saying is correct, the quickest fix may be for knex to report the exception without propagating it. @tgriesser what do you think?

myndzi commented 9 years ago

If it's crashing, then it's not being caught at all.. but the "Knex:Error" part gets added by knex so it must obviously be captured...? I recommend in the short term trying to find out what the error is being emitted on (the knex object?) and binding it. I'm sorry for letting this linger.

myndzi commented 9 years ago

Okay! I managed to reproduce the error scenario by hooking into net.Socket and "eating" a selected write while emitting an error on the socket object. However, the application does not crash for me. I am using the same versions of knex and pool2 that you are. Can you please give the following code a try?

'use strict';

var Socket = require('net').Socket;

var writeCount = 0, readCount = 0;

var connect = Socket.prototype.connect;
var write = Socket.prototype.write;

var target;

Socket.prototype.write = function (chunk, encoding, cb) {
  if (this !== process.stdout) {
    console.log('write', writeCount++);
    if (writeCount === 3) {
      this.emit('error', new Error('foo'));
      return;
    }
  }
  return write.call(this, chunk, encoding, cb);
};

Socket.prototype.connect = function (options, cb) {
  return connect.call(this, options, function (err, socket) {
    console.log('got socket');

    target = socket;

    this.on('data', function () {
      console.log('read', readCount++);
    });
    if (typeof cb === 'function') { cb(); }
  });
};

var knex = require('knex')({
  client: 'mysql',
  connection: {
    database: 'test'
  },
  pool: {
    max: 1,
    min: 1
  }
});

function ping() {
  return knex.select(knex.raw(1));
}

ping()
.then(function () {
  console.log('ping 1 returned');
  return ping();
})
.then(function () {
  console.log('ping 2 returned');
})
.catch(function (e) {
  console.error(e);
  return ping();
})
.then(function () {
  console.log('ping 3 returned');
});

My output looks like this:

got socket
write 0
read 0
read 1
write 1
read 2
ping 1 returned
write 2
Knex:Error Pool2 - Error: Pool.release(): Resource not member of pool
write 3
{ [Error: select 1 - foo] fatal: true }
got socket
write 4
read 3
read 4
write 5
read 5
ping 3 returned
^C

Be sure to adjust the connection parameters to a valid database. You will note that the third ping is executed after receiving the error, and that it succeeds, showing that the socket error case is handled correctly except for the fact that knex is likely double-releasing the resource or something, causing an error to be displayed that shouldn't.

myndzi commented 9 years ago

We can see the double release here:

  knex:client acquiring connection from pool: __knexUid1 +0ms
  knex:query select 1 +1ms
write 2
  pool2 Ungracefully destroying resource (id=0) +3ms
  pool2 Attempting to acquire minimum resources (cur=0, min=1) +1ms
  pool2 Attempting to acquire resource (cur=0, ac=0) +0ms
  knex:client releasing connection to pool: __knexUid1 +3ms
Knex:Error Pool2 - Error: Pool.release(): Resource not member of pool

With a stack dump, it is revealed that the first release comes from here:

  pool2 Ungracefully destroying resource (id=0) +3ms
Error
    at Pool.destroy (/home/myndzi/pool2-issue-12/node_modules/knex/node_modules/pool2/lib/pool.js:204:16)
    at connectionErrorHandler (/home/myndzi/pool2-issue-12/node_modules/knex/lib/dialects/mysql/index.js:133:17)

-> https://github.com/tgriesser/knex/blob/0.8.6/lib/dialects/mysql/index.js#L133

My original suspicion appears to be correct: the error handler destroys the resource, but the promise cleanup is attempting to release it even though there was an error:

-> https://github.com/tgriesser/knex/blob/0.8.6/lib/runner.js#L140

I'm not entirely sure how to cleanly address this; ideally the disposer would be aware of the error case and destroy the resource there instead of in the other place. Alternately, it could not attempt to release the connection if it had an error, or trap the error case and ignore it.

myndzi commented 9 years ago

Fix turned out to be somewhat easy (though still a little messy) since there was already a property I could check to know if the connection had been disposed of. You can use npm install git+https://github.com/myndzi/knex#double-release to test with your app and verify that everything is working better until the fix makes its way into an NPM release.

I'm still uncertain why your application was exiting due to this error, perhaps this is programmed behavior on your part? (bound an error event and died when you didn't need to?) -- so if the problem continues for some reason, or your program still exits, do get back to me and let's dig in further.

acgourley commented 9 years ago

I just wanted to post the log as I finally got the error to occur with DEBUG=pool2 on. I'm going to try your fix right now in parallel.

...
Sat, 12 Sep 2015 15:36:52 GMT pool2 Allocating resource (id=38) to request; waited 0s\n
Sat, 12 Sep 2015 15:36:52 GMT pool2 Pinging resource (id=38)\n
Sat, 12 Sep 2015 15:36:52 GMT pool2 Reserving request for resource (id=38)\n
Sat, 12 Sep 2015 15:36:55 GMT pool2 Successfully allocated new resource (cur=5, ac=0, id=39)\n
    at processImmediate [as _immediateCallback] (timers.js:367:17)\n
    at Immediate.Async.drainQueues [as _onImmediate] (/app/node_modules/bluebird/js/main/async.js:15:14)\n
    at Async._drainQueues (/app/node_modules/bluebird/js/main/async.js:133:10)\n
    at Async._drainQueue (/app/node_modules/bluebird/js/main/async.js:123:16)\n
    at Promise._settlePromises (/app/node_modules/bluebird/js/main/promise.js:697:14)\n
    at Promise._settlePromiseAt (/app/node_modules/bluebird/js/main/promise.js:581:18)\n
    at Promise._settlePromiseFromHandler (/app/node_modules/bluebird/js/main/promise.js:507:31)\n
    at tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)\n
    at /app/node_modules/bluebird/js/main/using.js:176:36\n
    at /app/node_modules/knex/lib/runner.js:44:21\n
    at Runner.query (/app/node_modules/bluebird/js/main/method.js:15:34)\n
    at Runner.tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)\n
    at Runner.<anonymous> (/app/node_modules/knex/lib/runner.js:118:24)\n
    at Client.query (/app/node_modules/knex/lib/client.js:127:24)\n
    at Client._query (/app/node_modules/knex/lib/dialects/mysql/index.js:88:12)\n
    at new Promise (/app/node_modules/bluebird/js/main/promise.js:70:37)\n
    at Promise._resolveFromResolver (/app/node_modules/bluebird/js/main/promise.js:480:31)\n
    at tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)\n
    at /app/node_modules/knex/lib/dialects/mysql/index.js:92:18\n
    at Connection.query (/app/node_modules/mysql/lib/Connection.js:201:25)\n
    at Protocol._enqueue (/app/node_modules/mysql/lib/protocol/Protocol.js:135:48)\n
    --------------------\n
    at TCP.onread (net.js:559:26)\n
    at exports._errnoException (util.js:746:11)\n
Error: read ETIMEDOUT\n
    at processImmediate [as _immediateCallback] (timers.js:367:17)\n
    at Immediate.Async.drainQueues [as _onImmediate] (/app/node_modules/bluebird/js/main/async.js:15:14)\n
    at Async._drainQueues (/app/node_modules/bluebird/js/main/async.js:133:10)\n
    at Async._drainQueue (/app/node_modules/bluebird/js/main/async.js:123:16)\n
    at Promise._settlePromises (/app/node_modules/bluebird/js/main/promise.js:697:14)\n
    at Promise._settlePromiseAt (/app/node_modules/bluebird/js/main/promise.js:581:18)\n
    at Promise._settlePromiseFromHandler (/app/node_modules/bluebird/js/main/promise.js:507:31)\n
    at tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)\n
    at /app/node_modules/bluebird/js/main/using.js:176:36\n
    at /app/node_modules/knex/lib/runner.js:44:21\n
    at Runner.query (/app/node_modules/bluebird/js/main/method.js:15:34)\n
    at Runner.tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)\n
    at Runner.<anonymous> (/app/node_modules/knex/lib/runner.js:118:24)\n
    at Client.query (/app/node_modules/knex/lib/client.js:127:24)\n
    at Client._query (/app/node_modules/knex/lib/dialects/mysql/index.js:88:12)\n
    at new Promise (/app/node_modules/bluebird/js/main/promise.js:70:37)\n
    at Promise._resolveFromResolver (/app/node_modules/bluebird/js/main/promise.js:480:31)\n
    at tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)\n
    at /app/node_modules/knex/lib/dialects/mysql/index.js:92:18\n
    at Connection.query (/app/node_modules/mysql/lib/Connection.js:201:25)\n
    at Protocol._enqueue (/app/node_modules/mysql/lib/protocol/Protocol.js:135:48)\n
    --------------------\n
    at TCP.onread (net.js:559:26)\n
    at exports._errnoException (util.js:746:11)\n
Top level error handling in API saw:  Error: read ETIMEDOUT\n
Knex:Error Pool2 - Error: Pool.release(): Resource not member of pool\n
Sat, 12 Sep 2015 15:36:55 GMT pool2 Attempting to acquire resource (cur=5, ac=0)\n
Sat, 12 Sep 2015 15:36:55 GMT pool2 Attempting to acquire minimum resources (cur=5, min=6)\n
Sat, 12 Sep 2015 15:36:55 GMT pool2 Ungracefully destroying resource (id=37)\n
...

acgourley commented 9 years ago

To clarify my crash issue - it was what Express decides to do when it has an unhanded exception when no error handler is defined. By defining an error handler I was able to stop the app from crashing (although the requests still failed as a result of this bug)

myndzi commented 9 years ago

Express doesn't do that, node does -- but when I reproduced the error, it was not an unhandled error event -- I didn't bind a handler, but it did not appear to be emitted on any object without an error handler. So, if it's being re-emitted somewhere or not bound somewhere, that may be in the purview of your app and not knex. I'm uncertain.

myndzi commented 9 years ago

The requests themselves will fail anyway; this is not inherently a bug that causes requests to fail -- it's the opposite. A network problem (timeout on reading from socket) causes the request to fail, and this occurs as a result of that. There is no way to recover the request other than to re-issue it, which would be either knex's job, or your job in response to a failed query. If the error isn't getting passed back to your query, that's something that should hopefully be fixed in knex, so that you have the ability to see that "this query failed because the underlying network socket died, so I can probably retry it" and do so if you so choose.

myndzi commented 9 years ago

The stack dump after "Sat, 12 Sep 2015 15:36:55 GMT pool2 Successfully allocated new resource (cur=5, ac=0, id=39)" -- did you add that?

myndzi commented 9 years ago

Thank you for the stack dump, too. Strangely, it seems to maybe not be the same problem, although sourced through the same files and for the same reason. I see the attempted release, but not the destroy as triggered by the connection error handler in the mysql client file. I'll have to dig a little deeper, but this supports the unhandled error scenario.

While you're testing with the new code, tack knex:* onto your debug line if you want a little more info on what knex itself is doing:

DEBUG=knex:*,pool2 node ...

myndzi commented 9 years ago

You know what, do you have Skype? I have time this weekend and could get on a call so we could cut out all the back and forth...

(Add me: myndzi, the one in NC)

acgourley commented 9 years ago

It does seem like your patch does not fix this issue :(

Here's a new log using the double released tag.

...
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 Sun, 13 Sep 2015 00:47:16 GMT pool2 Ungracefully destroying resource (id=7)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 Sun, 13 Sep 2015 00:47:16 GMT pool2 Attempting to acquire minimum resources (cur=5, min=6)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 Sun, 13 Sep 2015 00:47:16 GMT pool2 Attempting to acquire resource (cur=5, ac=0)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 API ERROR: Error: read ETIMEDOUT
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at exports._errnoException (util.js:746:11)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at TCP.onread (net.js:559:26)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 --------------------
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Protocol._enqueue (/app/node_modules/mysql/lib/protocol/Protocol.js:135:48)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Connection.query (/app/node_modules/mysql/lib/Connection.js:201:25)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at /app/node_modules/knex/lib/dialects/mysql/index.js:92:18
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Promise._resolveFromResolver (/app/node_modules/bluebird/js/main/promise.js:480:31)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at new Promise (/app/node_modules/bluebird/js/main/promise.js:70:37)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Client._query (/app/node_modules/knex/lib/dialects/mysql/index.js:88:12)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Client.query (/app/node_modules/knex/lib/client.js:127:24)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Runner.<anonymous> (/app/node_modules/knex/lib/runner.js:116:24)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Runner.tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Runner.query (/app/node_modules/bluebird/js/main/method.js:15:34)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at /app/node_modules/knex/lib/runner.js:44:21
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at /app/node_modules/bluebird/js/main/using.js:176:36
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at tryCatcher (/app/node_modules/bluebird/js/main/util.js:26:23)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Promise._settlePromiseFromHandler (/app/node_modules/bluebird/js/main/promise.js:507:31)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Promise._settlePromiseAt (/app/node_modules/bluebird/js/main/promise.js:581:18)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Promise._settlePromises (/app/node_modules/bluebird/js/main/promise.js:697:14)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Async._drainQueue (/app/node_modules/bluebird/js/main/async.js:123:16)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Async._drainQueues (/app/node_modules/bluebird/js/main/async.js:133:10)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at Immediate.Async.drainQueues [as _onImmediate] (/app/node_modules/bluebird/js/main/async.js:15:14)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 at processImmediate [as _immediateCallback] (timers.js:367:17)
A gae-default-20150912t112337-iirf 2015-09-12 17:47:16.000 Sun, 13 Sep 2015 00:47:16 GMT pool2 Successfully allocated new resource (cur=5, ac=0, id=23)
...

myndzi commented 9 years ago

To summarize from our Skype conversation:

The crash was because the knex query was being returned to some promise-handling code in the middleware which passed it along as an error, and there was no error handler middleware. The error message was logged simply because it was the result of the query. I gave you some options to deal with the underlying network problem that's causing you grief, which included reducing the pool minimum to 0 and/or specifying a resource ping handler.

acgourley commented 9 years ago

For anyone curious about my issue, and looking for the fix to knex, I made a gist with the full post mortem of the bug: https://gist.github.com/acgourley/9a11ffedd44c414fb4b8

albertovasquez commented 9 years ago

I appreciate the work to follow this through. I have a great level of confidence with this community. @acgourley thank you for your gist.

reggi commented 8 years ago

I tried setting the pool properties that @acgourley recommends in his doc but I'm still getting errors trying to connection on my Amazon ElasticBeanstalk instance.

Knex:Error Pool2 - Error: Timed out acquiring resource
Knex:Error Pool2 - Error: Timed out acquiring resource
Knex:Error Pool2 - Error: connect ETIMEDOUT {MYIP}:5432
Knex:Error Pool2 - Error: connect ETIMEDOUT {MYIP}:5432

I would really appreciate a solution to this issue.

Here's some of my output with DEBUG=pool2 enabled:

Wed, 17 Feb 2016 22:38:51 GMT pool2 Attempting to acquire minimum resources (cur=0, min=2)
Wed, 17 Feb 2016 22:38:51 GMT pool2 Attempting to acquire resource (cur=0, ac=0)
Wed, 17 Feb 2016 22:38:51 GMT pool2 Attempting to acquire resource (cur=0, ac=1)
Wed, 17 Feb 2016 22:39:01 GMT pool2 reap (cur=0, av=0)
Wed, 17 Feb 2016 22:39:11 GMT pool2 reap (cur=0, av=0)
Wed, 17 Feb 2016 22:39:21 GMT pool2 reap (cur=0, av=0)
Wed, 17 Feb 2016 22:39:21 GMT pool2 Timed out acquiring resource
Wed, 17 Feb 2016 22:39:21 GMT pool2 Couldn't allocate new resource: Timed out acquiring resource
Wed, 17 Feb 2016 22:39:21 GMT pool2 Destroying pool: unable to aquire a resource within 0s
Knex:Error Pool2 - Error: Timed out acquiring resource
Wed, 17 Feb 2016 22:39:21 GMT pool2 Timed out acquiring resource
Wed, 17 Feb 2016 22:39:21 GMT pool2 Couldn't allocate new resource: Timed out acquiring resource
Wed, 17 Feb 2016 22:39:21 GMT pool2 Destroying pool: unable to aquire a resource within 0s
Knex:Error Pool2 - Error: Timed out acquiring resource

With pool: {min: 0}:

Wed, 17 Feb 2016 22:42:00 GMT pool2 New ResourceRequest (id=0, timeout=Infinity)
Wed, 17 Feb 2016 22:42:00 GMT pool2 Growing pool: no resource to serve request (p=1, tba=0, tbt=0, max=10)
Wed, 17 Feb 2016 22:42:00 GMT pool2 Attempting to acquire resource (cur=0, ac=0)
Wed, 17 Feb 2016 22:42:05 GMT pool2 reap (cur=0, av=0)
Wed, 17 Feb 2016 22:42:05 GMT pool2 Not growing pool: pending=1, to be available=1
Wed, 17 Feb 2016 22:42:15 GMT pool2 reap (cur=0, av=0)
Wed, 17 Feb 2016 22:42:15 GMT pool2 Not growing pool: pending=1, to be available=1
Wed, 17 Feb 2016 22:42:25 GMT pool2 reap (cur=0, av=0)
Wed, 17 Feb 2016 22:42:25 GMT pool2 Not growing pool: pending=1, to be available=1
Wed, 17 Feb 2016 22:42:30 GMT pool2 Timed out acquiring resource
Wed, 17 Feb 2016 22:42:30 GMT pool2 Couldn't allocate new resource: Timed out acquiring resource
Wed, 17 Feb 2016 22:42:30 GMT pool2 Destroying pool: unable to aquire a resource within 0s
Wed, 17 Feb 2016 22:42:30 GMT pool2 ResourceRequest: reject (id=0)
Wed, 17 Feb 2016 22:42:30 GMT pool2 ResourceRequest: fulfilling with error: Pool was destroyed (id=0)
Knex:warning - Pool2 - Error: Pool was destroyed
Wed, 17 Feb 2016 22:42:30 GMT pool2 ResourceRequest: clearing timeout (id=0)
Knex:Error Pool2 - Error: Timed out acquiring resource

reggi commented 8 years ago

Been trying everything and i'm running out of ideas. Any thoughts?

myndzi commented 8 years ago

To be honest this error is really generic, and since the pool is central to connection management in Knex, any connection-related errors wind up with "pool2" tacked on to them, though they aren't usually because of pool2. In your error, "Knex:Error Pool2 - Error: connect ETIMEDOUT {MYIP}:5432" seems to indicate you're trying to open a connection to yourself, not Amazon, so that would be a problem. I recommend trying to connect to the correct host :)

P.S. min 0 means that it won't open a connection until you try to use it, not that it will open new connections every time.

myndzi commented 8 years ago

(Either way, it's almost certainly a network error... either your settings are incorrect or you can't connect due to firewall restrictions or something like that)

reggi commented 8 years ago

Thanks! @myndzi.

It is totally generic. I was having AWS security group issues. I tested pg-promise and that wouldn't connect on the server as well. Both pgp and knex worked locally. Just didn't enable access for EB instance to talk to db.

acgourley commented 8 years ago

It would be great if we could mark this issue as resolved. Do you know if the latest knex still uses pool2 incorrectly? I checked the source to see if they merged in https://github.com/myndzi/knex/commit/92f63334caa3640399914171f2d84bb9bb633b2e however the source seems to have changed so much that those lines are not relevant so I can't tell.

In addition I will move to the 0.10 version of knex and report back if I see bad behavior with my database.

grinnellian commented 8 years ago

Also curious if the merge has taken place?

AleRed commented 8 years ago

I need help with this issue too. If I close de DB connection while the server is executing a query or a migration, the process hangs and never ends. The output of executing with DEBUG=pool2 is:

pool2 reap (cur=0, av=0) +9s pool2 Growing pool: no resource to serve request (p=1, tba=0, tbt=0, max=10) +0ms pool2 Attempting to acquire resource (cur=0, ac=0) +1ms pool2 Couldn't allocate new resource: connect ECONNREFUSED +1s pool2 reap (cur=0, av=0) +9s pool2 Growing pool: no resource to serve request (p=1, tba=0, tbt=0, max=10) +1ms pool2 Attempting to acquire resource (cur=0, ac=0) +1ms pool2 Couldn't allocate new resource: connect ECONNREFUSED +1s pool2 reap (cur=0, av=0) +9s pool2 Growing pool: no resource to serve request (p=1, tba=0, tbt=0, max=10) +1ms pool2 Attempting to acquire resource (cur=0, ac=0) +1ms pool2 Couldn't allocate new resource: connect ECONNREFUSED +1s

I'm using Bookshelf 0.9.4, Knex 0.10.0 and pool2 1.3.4.

herlon214 commented 8 years ago

No answer yet? I'm with the same issue. Once my pool was destroyed it never try to reconnect

AleRed commented 8 years ago

Here goes the solution that worked for me: if one of my DB operations fails because of a lost of connection, I capture the error on the catch() function at the end of the promise.Then, I check the error code received to know what is the reason of the error. I've seen that the error code value is 'PROTOCOL_ENQUEUE_AFTER_FATAL_ERROR' when it happens because of a lost of connection, so if that's the value of the error catched, I destroy de Bookshelf pool connection. This is the catch function code:

.catch(function(err) {
    if(err.code == 'PROTOCOL_ENQUEUE_AFTER_FATAL_ERROR'){
        Bookshelf.knex.client.destroy();  //Close connection
    } else if(Bookshelf.knex.client.pool != null) {
       // Handle normal error
    }
});

olalonde commented 8 years ago

I'm seeing similar errors in my production logs but not sure how to reproduce:

Error: Pool is destroyed
    at Pool.acquire (/app/node_modules/pool2/lib/pool.js:166:12)
    at /app/node_modules/knex/lib/client.js:220:19
    at tryCatcher (/app/node_modules/knex/node_modules/bluebird/js/main/util.js:26:23)
    at Promise._resolveFromResolver (/app/node_modules/knex/node_modules/bluebird/js/main/promise.js:480:31)
    at new Promise (/app/node_modules/knex/node_modules/bluebird/js/main/promise.js:70:37)
    at Client.acquireConnection (/app/node_modules/knex/lib/client.js:216:12)
    at /app/node_modules/knex/lib/runner.js:137:23
    at tryCatcher (/app/node_modules/knex/node_modules/bluebird/js/main/util.js:26:23)
    at Promise._resolveFromResolver (/app/node_modules/knex/node_modules/bluebird/js/main/promise.js:480:31)
    at new Promise (/app/node_modules/knex/node_modules/bluebird/js/main/promise.js:70:37)
    at /app/node_modules/knex/lib/runner.js:136:35
    at tryCatcher (/app/node_modules/knex/node_modules/bluebird/js/main/util.js:26:23)
    at Function.Promise.attempt.Promise.try (/app/node_modules/knex/node_modules/bluebird/js/main/method.js:31:24)
    at Runner.ensureConnection (/app/node_modules/knex/lib/runner.js:135:26)
    at Runner.run (/app/node_modules/knex/lib/runner.js:29:31)
    at QueryBuilder.Target.then (/app/node_modules/knex/lib/interface.js:26:43)
    at QueryBuilder.tryCatcher (/app/node_modules/bookshelf/node_modules/bluebird/js/main/util.js:26:23)
    at doThenable (/app/node_modules/bookshelf/node_modules/bluebird/js/main/thenables.js:52:38)
    at tryConvertToPromise (/app/node_modules/bookshelf/node_modules/bluebird/js/main/thenables.js:30:20)
    at Promise._resolveCallback (/app/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:442:24)
    at Promise._settlePromiseFromHandler (/app/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:515:17)
    at Promise._settlePromiseAt (/app/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:581:18)
    at Async._drainQueue (/app/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:128:12)
    at Async._drainQueues (/app/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:133:10)
    at Immediate.Async.drainQueues (/app/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:15:14)
    at runCallback (timers.js:570:20)
    at tryOnImmediate (timers.js:550:5)
    at processImmediate [as _immediateCallback] (timers.js:529:5)

myndzi commented 8 years ago

I'm sorry for not being responsive, I don't seem to be receiving notifications for this project/thread in my e-mail? Bizarre.

This error is generally "can't connect to database"; I leave this issue open because closing it would probably just wind up with new ones getting opened for the same thing. It's not generally an actual problem with the module itself.

@olalonde : Your error indicates that knex at some point destroyed the connection, perhaps in response to an error event emitted by the underlying database driver. Once the pool has been destroyed, it cannot be reused; a new one would need to be instantiated. The correct handling of the problem, if it's not a fatal error, would be to not destroy the pool in the first place (a temporary or one-off connection error should simply drop the specific resource from the pool and re-queue the request for serving from a new connection)

@AleRed : what do you mean specifically by "I close the connection"? A pool has an end method that attempts to gracefully shut down the pool. It stops receiving new requests (rejects them with errors), and concludes the remaining queued requests; then it emits a 'drain' event, which causes the resources themselves to be terminated. The error emitted in that case if you try to acquire a resource should say 'pool is ending'. The only time the pool should ever be destroyed is if it was NEVER able to acquire a resource and a configurable retry timeout has expired.

An option was added somewhat recently, "requestTimeout", which should prevent your infinite retry scenario; you should be able to configure this via your knex setup, but the underlying problem may be elsewhere (in the means you are using to close the connection? some bug?) -- the section of log you pasted shows that the pool is unable to reconnect, and your comments seem to indicate that it was once connected, so I'm assuming by 'close the connection' you mean actually stop the database (so that it becomes unconnectable) -- not sure why you would do this while trying to execute a query, but the requestTimeout option may serve your immediate case.

@herlon214 : Your problem does not seem to be the same if you say "once it is destroyed it never tries to reconnect", assuming you were responding to AleRed -- his problem appears to be "it tries to reconnect forever". As mentioned in my response to olalonde, the 'destroy' case should only occur normally in the case where you never succeeded in acquiring a resource. This is to surface configuration problems and the like. The only reference in the source is here:

https://github.com/myndzi/pool2/blob/master/lib/pool.js#L465

That is the only function that sets 'this.destroyed':

https://github.com/myndzi/pool2/blob/master/lib/pool.js#L538

It's possible that something is calling _destroyPool, though a quick Github search through Knex's source does not indicate this is the case. Note that 'pool.destroy()' is not the same thing; this is for destroying a resource (uncleanly removing it from the pool)

I'm sorry at the lack of responsiveness to some of the previous issues; if any of you catch this and respond, I'll be happy to look into it further with you -- but I suspect that it will come down to a connectivity problem at some point, so I'd start looking there.

herlon214 commented 8 years ago

Thanks Kris!

Em sexta-feira, 5 de agosto de 2016, Kris Reeves notifications@github.com escreveu:

I'm sorry for not being responsive, I don't seem to be receiving notifications for this project/thread in my e-mail? Bizarre.

This error is generally "can't connect to database"; I leave this issue open because closing it would probably just wind up with new ones getting opened for the same thing. It's not generally an actual problem with the module itself.

@olalonde https://github.com/olalonde : Your error indicates that knex at some point destroyed the connection, perhaps in response to an error event emitted by the underlying database driver. Once the pool has been destroyed, it cannot be reused; a new one would need to be instantiated. The correct handling of the problem, if it's not a fatal error, would be to not destroy the pool in the first place (a temporary or one-off connection error should simply drop the specific resource from the pool and re-queue the request for serving from a new connection)

@AleRed https://github.com/AleRed : what do you mean specifically by "I close the connection"? A pool has an end method that attempts to gracefully shut down the pool. It stops receiving new requests (rejects them with errors), and concludes the remaining queued requests; then it emits a 'drain' event, which causes the resources themselves to be terminated. The error emitted in that case if you try to acquire a resource should say 'pool is ending'. The only time the pool should ever be destroyed is if it was NEVER able to acquire a resource and a configurable retry timeout has expired.

An option was added somewhat recently, "requestTimeout", which should prevent your infinite retry scenario; you should be able to configure this via your knex setup, but the underlying problem may be elsewhere (in the means you are using to close the connection? some bug?) -- the section of log you pasted shows that the pool is unable to reconnect, and your comments seem to indicate that it was once connected, so I'm assuming by 'close the connection' you mean actually stop the database (so that it becomes unconnectable) -- not sure why you would do this while trying to execute a query, but the requestTimeout option may serve your immediate case.

@herlon214 https://github.com/herlon214 : Your problem does not seem to be the same if you say "once it is destroyed it never tries to reconnect", assuming you were responding to AleRed -- his problem appears to be "it tries to reconnect forever". As mentioned in my response to olalonde, the 'destroy' case should only occur normally in the case where you never succeeded in acquiring a resource. This is to surface configuration problems and the like. The only reference in the source is here:

https://github.com/myndzi/pool2/blob/master/lib/pool.js#L465

That is the only function that sets 'this.destroyed':

https://github.com/myndzi/pool2/blob/master/lib/pool.js#L538

It's possible that something is calling _destroyPool, though a quick Github search through Knex's source does not indicate this is the case. Note that 'pool.destroy()' is not the same thing; this is for destroying a resource (uncleanly removing it from the pool)

I'm sorry at the lack of responsiveness to some of the previous issues; if any of you catch this and respond, I'll be happy to look into it further with you -- but I suspect that it will come down to a connectivity problem at some point, so I'd start looking there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/myndzi/pool2/issues/12#issuecomment-237767718, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQtMYwOUkMv2Ln7J5vljY8tc0Gqy3fEks5qctkqgaJpZM4FS2ae .

Att, Herlon Aguiar

myndzi commented 8 years ago

...seems like the change in Github subject lines borked my e-mail filters and these didn't go straight to my inbox anymore :( Hopefully fixed now.

daem0ndev commented 8 years ago

I ran into this issue myself, and oddly enough it was running on a shared environment where someone upgraded node to latest v6.3.1 and downgrading it back to v4.4.7 resolved it.

myndzi commented 8 years ago

Does that mean you're able to reproduce it readily?

natew commented 8 years ago

Getting the same issue here, hard to repro as I'm not sure what triggers it yet but will update if I find out.

myndzi / pool2

Seeing the pool randomly being destroyed after timeout #12