Open powersjcb opened 7 months ago
Yes, async hooks are expensive. The more promises.oer request the worse. This tool is not fit for running in prod.
I think I could improve it's performance a lot tho. Care to share a minimal reproduction case?
Sorry, I don't have any public repos I'm testing with.
The configuration I tested looks similar to this:
const blocked = require('blocked-at')
const express = require('express')
const datadogLib = require('./datadog');
const app = express()
const port = 3000
blocked((delay, stack, resource) => {
datadogLib.histogram('node.eventloop.blocked', delay);
logger.warn('blocked event loop', { delay, stack, resource });
}, { threshhold: 100, });
app.get('/', (req, res) => {
// todo: make API requests
res.send('Hello World!')
})
app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
I was seeing baseline CPU utilization increase from 3 mcores to 400 mcores. (memory remained about the same)
Datadog might be key here. They do a great deal of monkey patching and use hooks too. Might be clashing. If you could test without datadog for me that would be helpful.
Yeah, that seems sensible. I will give that a shot.
I'm seeing that many DD lines are present in our stack traces so that wouldn't surprise me.
"parsed": {
"_timestamp": "2024-03-25T19:34:29.029Z",
"blockedMs": 102.66803601074218,
"level": "error",
"message": "node event loop blocked",
"resource": {
"type": "HTTPINCOMINGMESSAGE"
},
"stack": [
" at AsyncHook.init (/app/node_modules/blocked-at/index.js:31:11)",
" at HTTPServerAsyncResource.emitInitNative (node:internal/async_hooks:202:43)",
" at connectionListenerInternal (node:_http_server:645:10)",
" at defaultTriggerAsyncIdScope (node:internal/async_hooks:463:18)",
" at Server.connectionListener (node:_http_server:618:3)",
" at Server.emit (node:events:529:35)",
" at Server.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/http/server.js:74:17)",
" at TCP.onconnection (node:net:2104:8)",
" at TCP.callbackTrampoline (node:internal/async_hooks:130:17)"
]
"level": "error",
"message": "node event loop blocked",
"resource": {
"type": "PROMISE"
},
"stack": [
" at AsyncHook.init (/app/node_modules/blocked-at/index.js:31:11)",
" at emitInitNative (node:internal/async_hooks:202:43)",
" at emitInitScript (node:internal/async_hooks:504:3)",
" at promiseInitHook (node:internal/async_hooks:324:3)",
" at promiseInitHookWithDestroyTracking (node:internal/async_hooks:328:3)",
... business logic redacted
" at /app/node_modules/dd-trace/packages/datadog-instrumentations/src/router.js:50:25",
" at /app/node_modules/dd-trace/packages/datadog-shimmer/src/shimmer.js:26:21",
" at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)",
" at next (/app/node_modules/express/lib/router/route.js:137:13)",
" at /app/node_modules/dd-trace/packages/datadog-instrumentations/src/router.js:100:12",
" at /app/node_modules/express-validation/lib/index.js:31:18",
" at processTicksAndRejections (node:internal/process/task_queues:95:5)"
]
My intuition is that we are probably seeing the performance issues because of this line.
https://github.com/naugtur/blocked-at/blob/master/index.js#L36C1-L37C1
Where we invoke a fairly expensive function to capture and store a stack trace on every event loop cycle. (this agrees with what this article mentions)
Do we know if there is a way to capture stack trace information only if the processing is too slow?
(testing a fork now to see if the stack collection is the problem)
most likely problem is async hooks. they slow things down. delete hook is the most expensive and I know I can probably get rid of it by making things clean up after themselves on report or using a weakmap. But looking forward to your findings.
I did some independent testing and confirmed we are having the same performance issues as described in this post. Our CPU utilization spiked to 100% and the node fell over when I tried to use this to find blocking stack traces from a dev envionrment.
https://www.ashbyhq.com/blog/engineering/detecting-event-loop-blockers
I was running node:18.18.1-alpine in k8s with express 4.17.3