Failures appear to be over-reported

cprice404 commented 6 years ago

It seems like however the failure counts are being reported to the locust master, they're being over-reported.

I'm frequently seeing runs where the main statistics screen in the UI shows over 3 million success compared to ~1000 failures, but then when I click on the failures tab it shows 1.5 million failures.

Furthermore if I click the "stop" button in the UI, and monitor the logs on my locust4j worker nodes, I can see that they've stopped sending traffic (my server metrics also confirm this), but the failure count in the locust UI continues to rise until I kill the worker process, at which point it stops.

It seems like there is some thread that is reporting the failure metrics in the background, and it's just not clearing out the failure data after it reports it, so it accumulates over time and just keeps inflating the failure stats.

myzhan commented 6 years ago

Thanks for your report, can you provider a minimum example to reproduce the issue?

cprice404 commented 6 years ago

A very reasonable request :) I will try to find time.

For now what I ended up doing is changing my code to call recordSuccess instead of recordFailure, and differentiating my errors from my successes by using the String fields that recordSuccess accepts. The metrics reporting seems to work much better in this case.

Basically any time I've called recordFailure it seems like the failure count metrics in the UI just keep increasing forever, even after I've stopped the test.

myzhan commented 6 years ago

Maybe I found out the answer, https://github.com/myzhan/locust4j/blob/master/src/main/java/com/github/myzhan/locust4j/Queues.java#L15

myzhan commented 6 years ago

The stats thread keeps consuming the queue when the test is stopped. “recordFailure” is slow, it has to calculate the md5 hash of each error. "recordSuccess" is much faster.

myzhan / locust4j

Failures appear to be over-reported #2