Improve locust config - Githubissues

heyman commented 4 years ago

Pull request for #2.

The Docker image for locust is set to heyman/locust-bench in this PR, but the Dockerfile used to build heyman/locust-bench has also been added to the dockerfiles folder.

heyman commented 4 years ago

Here are the results that I got after a quick test:

------------------------------------------------------------- All test results ---------------------------------------------------------------
Testname            Runtime  Requests  Errors  RPS    RTTMIN(ms)  RTTMAX(ms)  RTTAVG(ms)  RTT50(ms)  RTT75(ms)  RTT90(ms)  RTT95(ms)  RTT99(ms)
apachebench_static  10.51s   100000    0       10946  0.37        -           1.82        1.81       1.88       1.94       1.97       2.70
hey_static          10.11s   100000    0       9884   0.2         12.6        2           2          2.1        2.1        2.3        4
wrk_static          31.35s   341006    -       11329  -           12.71       1.76        1.73       1.79       1.88       -          2.37
artillery_static    77.60s   100000    0       1392   0.38        51.92       6.60        4.48       7.81       8.22       11.28      25.46
vegeta_static       31.36s   99990     0       3333   0.23        2.78        0.38        0.38       0.39       0.41       0.43       0.59
siege_static        1.10s
tsung_static        14.74s   100000    0       9665   0.16        1019.57     1.75        1.81       1.95       2.07       2.16       3.63
jmeter_static       14.81s   100000    0       8954   0           95          2.04        2          2          3          3          5
gatling_static      52.48s   36654     0       1219   0           69          0.74        1          1          1          2          2
locust_scripting    36.06s   237737    0       7922   0           15          1           1          2          4          5          10
grinder_scripting   34.16s   264307    1       8810   0           788         2.12        2          2          2          2          4
wrk_scripting       31.28s   334562    -       11147  -           11.74       1.78        1.76       1.80       1.86       -          2.68
k6_scripting        33.51s   283457    -       9448   23.3        2.03        1.96        -          2.07       2.14       -
----------------------------------------------------------------------------------------------------------------------------------------------

I ran it on a machine running nothing else except for a nginx docker container which was what I were running the load tests against. Machine specs:

Intel Xeon W3520 (4c/8t) 2.66GHz, 16GB RAM (1333 MHz)

The siege test didn't work for some reason.

ragnarlonn commented 4 years ago

@heyman Hi there and thanks for the PR. Sorry also for taking so long to give you any feedback on it - I am working on a new load testing tool benchmarking & comparison article now and just downloaded the latest Locust (0.13.5) and fed it with your updated script that uses FastHttpLocust, but did not get results anywhere near those you got above. Running Locust in distributed mode, with one Locust slave per CPU core (4 cores) I managed to get Locust to push through close to 3000 RPS on my test setup here, while e.g. Apachebench manages ~15000 RPS on a single core. Wrk does over 50,000 RPS using four threads.

I see you ran both the target system and the load generators on the same physical machine. This isn't something I'd recommend, as the load generator and the target system will be competing for CPU and other resources. It's important to make sure the target system is able to serve a lot more than you can throw at it and with one machine, you often don't know what program gets what resources. Hmm, if you started a vanilla Nginx it will only have one worker thread... That means it will use max 1/8 of the machine's CPU resources to serve requests. If this is the case I suspect you may have been CPU bound on the target side when you ran your test, meaning Nginx could maybe not serve more than ~10k RPS regardless of which load testing tool you used. Locust ending up at ~8k RPS may have been due to it running out of CPU on the load generation side. But this is just a wild guess of course. Would have been interesting to see CPU usage logs during the tests.

Anyway, about to the PR :) At first I didn't like this solution of using a custom Docker image to run Locust in distributed mode. It's very hard to fairly compare all these tools when they have such different sets of functionality, and just running one instance of each tool on the command line seems like some kind of "simplest possible test". I could add code to distribute Apachebench load generation too, and have AB wipe the floor with most tools in terms of RPS numbers in this test suite. It would require much more scripting than in the case of Locust, of course, so that wouldn't really be fair to Locust, but I'm not sure it is fair to have a test suite where some tools use custom provisioning solutions to get the best test results. I'd like the tools in this test suite and how you run them to be as simple and "vanilla" as possible, at least by default.

After thinking some about this, I think the best solution may be to offer two ways of running Locust from the test suite: distributed and non-distributed. We keep the old Locust execution mode as it was, but add yours as the "Locust distributed" mode. What do you think about that? That would show off Locust's nice, built-in support for load distribution while at the same time being transparent about the fact that if you just launch a single Locust instance on the command line, you're not going to get stellar RPS numbers.

I may be doing something similar for k6 because we've found it to use up quite a lot of memory and built a "compatibility mode" that excludes newer Javascript functionality in order to offer a low-memory version for those who need to scale up to higher VU numbers. It seems relevant there too, to let people know they might not be able to simulate many thousands of VUs with "vanilla" k6 but have to enable this special mode that results in a different UX (the scripting API will be a little different).

So to recap - my suggestion is to add a second execution mode for Locust, accessible from the menu runtest.sh displays to the user. Call it "Locust distributed", or something like that, and include that test in the "Run all tests" option also, of course.

Might not belong in this forum, but I can add also, that when writing the new load testing tool comparison article I'll try to show Locust performance in both distributed and single-core mode and make it clear that running Locust in distributed mode is really simple. Locust is a great tool, with very developer friendly UX, and it's nice to see development seems to have accelerated a lot the past two years! Keep up the good work!

heyman commented 4 years ago

I see you ran both the target system and the load generators on the same physical machine. This isn't something I'd recommend, as the load generator and the target system will be competing for CPU and other resources.

I agree. The main reason I wanted to run the tests myself was because the the Locust results in the blog seemed waay lower than I'd expect. At the time I only had a single idle server sitting around, so I took that. I'd definitely take the results from my quick test above with a huge grain of salt (especially for the tools that goes above 10k RPS). However, if you just look at the results for the tools in the lower performing quarter (where the nginx process shouldn't be the bottleneck), it still shows that one can achieve a much higher RPS count than the blog posts depicted.

I'd like the tools in this test suite and how you run them to be as simple and "vanilla" as possible, at least by default.

I'd definitely say that running Locust distributed is the vanilla. At least if you're performing any somewhat large-scale load tests (which should be the only case where you'd actually care about benchmarks).

It would be trivial for us to have the the official Locust Docker image support spawning multiple slave node processes. The reason we haven't done that, is that in a Docker environment, we'd recommend users to run multiple containers instead (e.g. through Kubernetes, Docker-compose, or some other orchestration technique). That's the reason I - in this PR - made a separate Docker image (inheriting from the official one) with a small wrapper script that spawns multiple slave nodes. I guess one could (maybe?) make the loadgentest script spawn multiple Docker containers instead, but I doubt it would affect the results.

I'm not sure it is fair to have a test suite where some tools use custom provisioning solutions to get the best test results.

If the benchmark has the defined limit that the tool has to be run in a single command (which can't be a shell script), I can understand that point. However, I think the benchmark would have the most value if it ran the tests in a way that resembled a real-world scenario as close as possible.

After thinking some about this, I think the best solution may be to offer two ways of running Locust from the test suite: distributed and non-distributed. We keep the old Locust execution mode as it was, but add yours as the "Locust distributed" mode.

I guess that could work, though I still think that running Locust distributed is a more "default" mode for running locust (except for when developing the test scripts).

when writing the new load testing tool comparison article I'll try to show Locust performance in both distributed and single-core mode and make it clear that running Locust in distributed mode is really simple.

Thanks :)! The focus for Locust has never been to maximize the RPS count (then we'd never have selected python as the language). Instead (since hardware is cheap compared to developer time) we've optimized for developer time (and happiness) when developing realistic non-trivial load tests. Therefore I'd never expect Locust to perform anywhere near the tools that have been made with the goal to cram out as high RPS as possible.

Also, I really think that this repository is exemplary for a benchmarking blog post, since it was very easy for me to clone and reproduce the tests for all the tools :+1:!

ragnarlonn commented 4 years ago

However, if you just look at the results for the tools in the lower performing quarter (where the nginx process shouldn't be the bottleneck), it still shows that one can achieve a much higher RPS count than the blog posts depicted.

Yeah, in my new tests now it looks as if Apachebench is 15-20 times more efficient than Locust in terms of traffic generation. In the old test that multiplier was around 50, so it absolutely seems Locust is a lot better at generating traffic now since it got the FastHttpLocust class (there hasn't been any new AB release since then, so that comparison should be fairly valid). 15 vs 50 is about 3 times better than before, just like the Locust docs say. And again, I'm sorry if it doesn't come through in the articles that you can actually get more out of Locust through distributing load generation - I'll try to do better this time.

I'd definitely say that running Locust distributed is the vanilla. At least if you're performing any somewhat large-scale load tests (which should be the only case where you'd actually care about benchmarks).

I think many are interested in load generation efficiency and that that is an important benchmark metric to measure. Complexity always comes at a cost, so any tool I don't have to run in a distributed fashion is a plus, and I think there are quite a few people looking at adding load tests to their automated test suites now, who often prefer to run the tests from a single machine, sometimes the CI server. That machine may be just powerful enough to generate the required amount of traffic onto e.g. a staging system using a single core with a high-performing tool, or multiple cores with a lower-performing tool. This means that the UX they get with Locust differs from the UX they would get with a multithreaded tool as they might have to run multiple instances. Also, if I require really large scale tests, scaling a high-performing tool is both simpler and cheaper because I'll need to run fewer instances of it.

But, like I wrote earlier, I think the fair thing to do here, both for the test suite but also for the blog articles, is to test Locust in both single-instance and distributed mode. If any other single-threaded tools had support for load distribution I'd test with those also in distributed mode. Come to think of it, it could be interesting to test distribution anyway for multi-threaded tools like e.g. Tsung, because there is always an overhead when distributing an application like this, and that overhead might be different for different tools.

heyman commented 4 years ago

Fair enough! Thanks for explaining your point of view!

In your new test, how does the locust test file look? Have you addressed the issue I described in the first bullet point in #2?

ragnarlonn commented 4 years ago

@heyman I've run some new tests of 0.13.5 and this time I used the locust.py code you used in this PR, so I think it should be OK:

` from locust import TaskSet, task, constant from locust.contrib.fasthttp import FastHttpLocust

class UserBehavior(TaskSet): @task def bench_task(self): while True: self.client.get("/")

class WebsiteUser(FastHttpLocust): task_set = UserBehavior wait_time = constant(0) `

heyman commented 4 years ago

Cool!

Please note that for this test script, there shouldn't be any gain from simulating more locust users than 1 per slave node (or simply more than one when running locust non-distributed).

ragnarlonn commented 4 years ago

You're right - in this particular lab setup it doesn't matter how many VUs each Locust process runs, but without knowing the network delay between load generator and target system this isn't a given!

I might be mistaken, but I think Locust just uses 1 TCP connection per VU, right? That means the theoretical max RPS will be 1/network RTT. I don't want the max RPS tests to be limited by the network RTT here, as I want to test the tools and not the speed of my LAN. That's why it's important to vary the concurrency level when testing max RPS - to make sure no tool is limited by network delay.

Funnily, I got the highest RPS rate when testing Locust using 60 VU (4 slaves, so each slave simulated 15 VU). Maybe just coincidence - all test results, regardless of VU level, were very close I remember.

domik82 commented 4 years ago

@ragnarlonn - are your current results available anywhere?

I saw locust poorly performing on the blog post and went to locus slack to get more info I was pointed to this PR as a solution "why".

Making any technical decision based on visible results is spoiled and not really fair for locust. In the same time it's one of very few GOOD articles doing such comparison (really good work).

Ideally you should point somewhere on the old blog post that the results are not using multi agents which is typical usage.

Waiting for new article around that one.

ragnarlonn commented 4 years ago

@domik82 I've been working on an update of the article - actually it will be a huge article that contains both subjective review on tool usage/functionality and a benchmark comparison. The old articles will remain, but contain text that tell readers to go read the latest one instead. In the latest review I'm running Locust primarily in distributed mode. I think you're both right that it is unfair not to do that, as distributed execution is so simple to set up with Locust. I do still, however, see it as a minus point that I am forced to run it in distributed mode even when on a single host, to get good performance out of it. I can leak here that I'm much happier with Locust performance this time round, and that Locust is the only tool that seems to have become faster since the last review. The new article should be published by the end of february.

domik82 commented 4 years ago

@ragnarlonn - one more thing you should consider when performing tools comparison. I learned that locust is using shared connection pool while tool like gathling is using separate connection per user.

Both approaches has pros and cons. More TCP connections mean that we need more resources on load generator side - it will be same on server side where we can spot bottlenecks.

Less connections would probably be useful for testing application rather web server as it might show to optimistic view when it goes to system resources usage (@heyman you might have some thoughts about it).

Anyway I don't know if locust can be configured in a way that single user = new TCP connection. I heard that gatling can be configured for using shared pool which is not default config. This might affect results you are having.

To be honest I never really thought about it till now (like how JMeter is doing it.)

ppcano commented 4 years ago

I have merged this PR and #4 as well.

It worths mentioning that @ragnarlonn used this project for the 2017 benchmark comparison but not for his latest comparison.

heyman commented 4 years ago

@ppcano Cool! Do you know if there was a particular reason for why this project wasn't used for the latest comparison?

ppcano commented 4 years ago

@heyman, To reduce the time writing the latest comparison. Ragnar commented at https://github.com/loadimpact/loadgentest/issues/2#issuecomment-595429858 to you.

I asked him the same question via email, here his reply:

I didn't use the loadgentest setup for the tests this time, because I felt it probably needed some updating and I wanted to reduce the amount of work involved in writing the article. And also because my load generator machine didn't have a lot of disk space so I wasn't sure if running a lot of Dockerized things on it would work

Regarding this project, I upgraded some tools at #4 but #5 is an existing bug. Unfortunately, I don't have much experience with the different tools and this project.

But I think this project only needs a little work and it could still be valid.

Could #2 be closed?

heyman commented 4 years ago

Ah, I see, thanks!

Could #2 be closed?

Yep, I think so!

grafana / loadgentest

Improve locust config #3