Closed cprice404 closed 4 years ago
Thanks for your report, can you provider a minimum example to reproduce the issue?
A very reasonable request :) I will try to find time.
For now what I ended up doing is changing my code to call recordSuccess
instead of recordFailure
, and differentiating my errors from my successes by using the String fields that recordSuccess
accepts. The metrics reporting seems to work much better in this case.
Basically any time I've called recordFailure
it seems like the failure count metrics in the UI just keep increasing forever, even after I've stopped the test.
Maybe I found out the answer, https://github.com/myzhan/locust4j/blob/master/src/main/java/com/github/myzhan/locust4j/Queues.java#L15
The stats thread keeps consuming the queue when the test is stopped. “recordFailure” is slow, it has to calculate the md5 hash of each error. "recordSuccess" is much faster.
It seems like however the failure counts are being reported to the locust master, they're being over-reported.
I'm frequently seeing runs where the main statistics screen in the UI shows over 3 million success compared to ~1000 failures, but then when I click on the failures tab it shows 1.5 million failures.
Furthermore if I click the "stop" button in the UI, and monitor the logs on my locust4j worker nodes, I can see that they've stopped sending traffic (my server metrics also confirm this), but the failure count in the locust UI continues to rise until I kill the worker process, at which point it stops.
It seems like there is some thread that is reporting the failure metrics in the background, and it's just not clearing out the failure data after it reports it, so it accumulates over time and just keeps inflating the failure stats.