Closed scottinet closed 6 years ago
Tried with 1000 users. Unable to reproduce.
Still able to reproduce it.
Here are some of the logs I obtain, with the exact snippet I posted in the initial post:
2017-07-24T07:57:02+00:00 [LOG:ERROR] (500) [es_rejected_execution_exception] rejected execution of org.elasticsearch.transport.TransportService$7@d4266ce on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6d5792d7[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 46]]
InternalError: [es_rejected_execution_exception] rejected execution of org.elasticsearch.transport.TransportService$7@d4266ce on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6d5792d7[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 46]]
at respond (/var/app/node_modules/elasticsearch/src/lib/transport.js:295:15)
at checkRespForFailure (/var/app/node_modules/elasticsearch/src/lib/transport.js:254:7)
at HttpConnector.<anonymous> (/var/app/node_modules/elasticsearch/src/lib/connectors/http.js:159:7)
at IncomingMessage.bound (/var/app/node_modules/elasticsearch/node_modules/lodash/dist/lodash.js:729:21)
at emitNone (events.js:91:20)
at IncomingMessage.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickDomainCallback (internal/process/next_tick.js:128:9)
And we also get a lot of these messages:
node:78) Warning: a promise was created in a handler at /var/app/lib/api/controllers/funnelController.js:213:30 but was not returned from it, see http://goo.gl/rRqMUw
at Function.Promise.cast (/var/app/node_modules/bluebird/js/release/promise.js:196:13)
I do not have the same trace, but errors like:
ERROR: unhandledRejection: not opened Error: not opened
at WebSocket.send (/var/app/node_modules/ws/lib/WebSocket.js:358:18)
at emit (/var/app/lib/services/broker/wsBrokerClient.js:335:31)
at WSBrokerClient.send (/var/app/lib/services/broker/wsBrokerClient.js:310:12)
at kuzzle.funnel.execute (/var/app/lib/api/core/entrypoints/proxy/index.js:113:18)
at FunnelController._executeError (/var/app/lib/api/controllers/funnelController.js:491:7)
at checkRights.then.then.catch.err (/var/app/lib/api/controllers/funnelController.js:214:26)
at runCallback (timers.js:637:20)
at tryOnImmediate (timers.js:610:5)
From previous event:
at FunnelController.execute (/var/app/lib/api/controllers/funnelController.js:214:13)
at ProxyEntryPoint.onRequest (/var/app/lib/api/core/entrypoints/proxy/index.js:95:24)
at proxy.listen.data (/var/app/lib/api/core/entrypoints/proxy/index.js:56:51)
at handlers.(anonymous function).forEach.cb (/var/app/lib/services/broker/wsBrokerClient.js:190:47)
at Array.forEach (native)
at WebSocket.client.socket.on.raw (/var/app/lib/services/broker/wsBrokerClient.js:190:33)
at emitOne (events.js:96:13)
at WebSocket.emit (events.js:188:7)
at Receiver._receiver.onmessage (/var/app/node_modules/ws/lib/WebSocket.js:146:47)
at Receiver.dataMessage (/var/app/node_modules/ws/lib/Receiver.js:389:14)
at Receiver.getData (/var/app/node_modules/ws/lib/Receiver.js:330:12)
at Receiver.startLoop (/var/app/node_modules/ws/lib/Receiver.js:165:16)
at Receiver.add (/var/app/node_modules/ws/lib/Receiver.js:139:10)
at Socket._ultron.on (/var/app/node_modules/ws/lib/WebSocket.js:142:22)
at emitOne (events.js:96:13)
at Socket.emit (events.js:188:7)
at readableAddChunk (_stream_readable.js:176:18)
at Socket.Readable.push (_stream_readable.js:134:10)
ok, my traces was due to a bad configuration.
Now, with latests docker images and 1.x
branch updated, I cannot reproduce the bug, even with 1000 documents in a row
I've just had in one test (but not reproduced later either) some failed connections errors between Kuzzle and Elasticsearch, but Kuzzle did not crash.
waiting for retest the scenario in Seb's instance, when we'll have the time to do it.
OK, reproduced when I decrease the queue_size of thread_pools in elasticsearch settings (for search queries):
thread_pool:
search:
queue_size: 20
That's normal, since Kuzzle's default configuration allows it to handle up to 50 concurrent requests. And it's nice to fix the diagtools trigger when that happens.
But what I don't get is why I'm able to reproduce this with 100% reliability and not you nor @xbill82 , with a 200 requests queue configured in ES.
I've enabled debug logs in Elasticsearch, and though the error message states that the Bulk ThreadPool is saturated, rejected requests are really user document creations.
This would suggest that Kuzzle doesn't chain requests correctly and is able to perform n requests to ES for each user request, but I'm still looking for the code responsible of this.
Reproducing the issue too.
My first assumption is that, when using wait_for
option, create
calls will actually trigger the bulk
thread pool.
I will check in es source code.
I also traced requests sent to ES. Each user creation generates 4 requests:
One of the 2 searches can probably be skipped (they use the same parameters) but this does not explain this particular issue as _search and _write use a different thread pool.
I did not drill long enough in es code to confirm/infirm the above assumption.
I've come to the conclusion the exception is normal when the wait_for
option.
I just tested a regular document:create
loop with and without the wait_for
option and even if I had to upsize the loop count, results are pretty clear: we do get exceptions when using wait_for
while we don't without.
The only real problem I am seeing there is that the error message is sent to the end-user and does not appear in Kuzzle error logs. This is OK for most of ES errors, but this particular case may deserve to be more visible from admins/developers as I suppose there could be cases where modifying thread buffers size would not be a problem.
How to reproduce: send 200+ user creation requests in rapid fire (using Kuzzle JS SDK):
Error thrown by Kuzzle: