Closed 9mm closed 5 years ago
The scheduler utilization doesn't seem insane in the pictures you posted, it looks pretty low. Note that Erlang schedulers will do a busy wait when waiting for work so it's possible that's part of what you're seeing.
Ok Here's an interesting turn of events.
I started trying to reproduce everything locally. Petrohi recommended I put the timeout at 10_000
ms to see what happens. It was then that I noticed this:
If I use machine gun/gun to request the endpoint 10 times, it hangs for ~5-8 seconds once every few refreshes. These are the response times:
4636, 97, 459, 101, 102, 4690, 4538, 102
If I take the exact url and put it in chrome and refresh, I get these times
244, 102, 207, 243, 208, 211, 255, 149, 202, 276, 102
So something about the library is 'falsely' causing the response to be horribly delayed.
I didnt notice it with the other 2 endpoints, but I think the problem is still there, it just requires more load (like in production) before the problem presents itself
Can you show the code?
i just moved on this issue is just crazy
I originally posted this on
machine_gun
because that's the wrapper I'm using, but the underlying lib he uses is gun obviously.I was thinking about reposting it here but for simplicity it might be easier for me to link it.
Basically I'm trying to get gun to scale to 25,000 requests/second and even when its only 200 RPS on a single machine, the CPU is going insane.
https://github.com/petrohi/machine_gun/issues/10
Do you have any ideas on this? is it normal for gun to use so much CPU (or request handling in general). I'm kind of not sure what to do, its making it very difficult to scale vertically to handle our load