tulios / kafkajs

A modern Apache Kafka client for node.js
https://kafka.js.org
MIT License
3.74k stars 525 forks source link

190% CPU hike on v2.2.4 #1556

Open Hyjaz opened 1 year ago

Hyjaz commented 1 year ago

Describe the bug When upgrading to v2.2.4 we saw a 190% increase in our CPU usage.

To Reproduce Not sure how you can reproduce it. We do however send thousand of messages per second.

If none of the above are possible to provide, please write down the exact steps to reproduce the behavior:

  1. Run a producer that continuously produces messages to a topic

Expected behavior No cpu usage hike between v2.2.3 and v.2.2.4

Observed behavior A clear and concise description of what did happen. Please include any relevant logs with the log level set to debug.

Environment:

Additional context Add any other context about the problem here.

Nevon commented 1 year ago

Do you have a CPU profile that could show where CPU time is being spent? You can use something like 0x to generate a flame graph, if you don't instrument your application with some APM solution. Ideally with a comparison to 2.2.3

Hyjaz commented 1 year ago

Hello @Nevon, on the latest kafkajs version it seems to be coming from the scheduleCheckPendingRequest. I noticed that there was a change related to this in the requestQueue/index.js in the latest release. Let me know if you need more details.

Screenshot 2023-03-21 at 08 41 31

This is v2.2.3. You can see there is a huge jump in the cpu usage between v2.2.4 and this one.

Screenshot 2023-03-21 at 09 06 00
Nevon commented 1 year ago

Thank you, that's what I suspected, but it's great to have some data to back it up. For reference, the change was introduced in #1532.

JoseGoncalves commented 1 year ago

Also noticed this CPU increase in my app. In idle v2.2.3 uses almost no CPU, while v2.2.4 uses around 8%.

MDSLKTR commented 1 year ago

I kinda want to have the best of 2.2.3 and 2.2.4 but the CPU spike is way too much to upgrade currently, which is why we pinned it to 2.2.3

@Nevon i gave it a stab in the linked PR here https://github.com/tulios/kafkajs/pull/1572

siimsams commented 1 year ago

We also have this issue after upgrading from kafkajs 1 -> latest. All services that have upgraded consume way more cpu and event loop iterations per second have increased 100X. After applying @MDSLKTR's fix as a patch this issue goes away. Would like to see this get merged asap.

Thank you @MDSLKTR !

1solation commented 12 months ago

Also noticed this CPU increase in our app. Using roughly 1.5/2x more CPU than previously

atiquefiroz commented 11 months ago

We have been using kafka-node for very long time, and decided to move to kafkajs for publishing to start with. We have a very high throughput logging system (~0.5 Million RPM) for a single micro service. Once we switched to kafakjs 2.2.4 the CPU spike as mentioned was too much to handle on resource side. So i can validate that @MDSLKTR findings effect the system in expected way. We switched it back to 2.2.3 and the resource utilisation came back to normal. We should think of patching this is next version. Attaching some system matrices from our production. ( First spike is when we switched from kafka-node to kafkajs 2.2.4, second downfall is when we dpeloyed kafkajs 2.2.3 )

Screenshot 2023-11-11 at 9 31 17 AM Screenshot 2023-11-11 at 9 31 58 AM Screenshot 2023-11-11 at 9 33 34 AM Screenshot 2023-11-11 at 9 24 42 AM