grafana / loadgentest

Lab environment for benchmarking different load testing tools
32 stars 10 forks source link

Misleading results for Locust #2

Closed heyman closed 4 years ago

heyman commented 4 years ago

Hi!

I read your blog post with benchmarks of different load testing tools. Even though the goal with Locust has never been to be the tool that can cram out the most number of RPS per core, I still found the results suspiciously bad. I've looked at the code in this repo and believe I've found some major issues in how it configures Locust:

I'd be happy to do a PR improving how Locust is configured in this benchmark, however it seems like the Docker image is no longer building (npm install fails due to some TLS cert error). Any idea on how to fix it (I haven't really dug into it yet)?

On my home machine I was able to push 6500 requests/s when running Locust non distributed on a single core, as compared to Apache Bench where I reached 12500 requests/s. When running Locust with one master process and 4 worker processes I got ~21000 requests/s.

I do believe that with the above changes it would put Locust RPS wise much closer to the other tools in this benchmark. When it comes to round trip time, it will always start getting way off with Locust if the machine(s) maxes out on CPU usage. Under a normal locust test you'd have a lot of simulated users that are idling/waiting and as you increase the number of simulated users the CPU usage would go up, and you'd add more slave nodes before it would reach 100%.

heyman commented 4 years ago

I realised that the Dockerfile for the whole project isn't used anymore, and that it now starts individual docker containers for each tool.

I've submitted a PR that fixes the issues I described: #3

ppcano commented 4 years ago

Hi @heyman,

Thank you for participating, and our apologies for the late reply here.

I still found the results suspiciously bad

First, I'd like to clarify that there have never been bad intentions with the comparison. The benchmarks and this repo are public, and from the beginning, we wanted other authors or users to contribute to this project or sharing their views.

We run the first version of the benchmarks in 2016 as part of our research to build k6. One of our goals was to create a tool optimized for minimal consumption of resources; these benchmarks were a crucial part of guiding us in our goals and expectations with k6. Then, we thought the benchmarks could be an interesting blog post.

We sincerely apologies if there were errors. There are 13 tools (including k6), and we could not have configured a tool most efficiently, or the tools could have added new features later on.

We know the project is outdated, and a few months ago, we decided to update it with the latest versions of each tool, run the benchmarks, and update the content of the blog post.

There was some progress in this branch, but other priorities blocked it. Your comment was a good reminder to return to this matter, and we are taking into consideration your feedback. Hopefully, we are coming soon with the update.

I will ping you when we've completed it, and the locust team is welcome to participate in it.

heyman commented 4 years ago

First, I'd like to clarify that there have never been bad intentions with the comparison. The benchmarks and this repo are public, and from the beginning, we wanted other authors or users to contribute to this project or sharing their views.

I totally get that. Sorry if my issue/PR sounded bitter/snark :). I think this project is awesome. It was quite simple for me to change the Locust config and do a re-run of all the whole test which is pretty great, and far too uncommon for benchmark articles.

heyman commented 4 years ago

Do you want me to update the PR (#3) to merge into the 2019-update branch instead of master?

ppcano commented 4 years ago

cc @ragnarlonn 👆?

ragnarlonn commented 4 years ago

@heyman I just happened to re-read your old issue description here and the figures made me jump: are you sure about them? In my recent testing I have seen that Locust has improved a lot in speed, when using the new FastHttpLocust library, but I'm definitely not seeing it beat Apachebench even when Locust is using four cores and Apachebench uses one.

What parameters did you use when you ran Locust and Apachebench? What was the network delay between the load generator and target system?

One immediate guess would be that you weren't using the -k flag (to turn on HTTP keep-alive) when running Apachebench. Locust seems to support keep-alive by default.

heyman commented 4 years ago

@ragnarlonn I think my quick test with Apache Bench was pretty flawed, as I now suspect that it might have been nginx (running on the same machine) that were the bottleneck. Also, I did not use the -k parameter. So that number for apache bench in my original post should be disregarded, or at least taken with a huge grain of salt.

ragnarlonn commented 4 years ago

Phew, I was beginning to think I had missed something vital in my testing :) And just to be clear: it's not surprising to me that a tool which executes sophisticated script code is slower than a tool that does nothing but make HTTP requests. I'm trying to get that point through in the new article also.

ppcano commented 4 years ago

Here the link of the new article @ragnarlonn commented: https://k6.io/blog/comparing-best-open-source-load-testing-tools

heyman commented 4 years ago

Cool! Just skimmed through it but I'll make a real read later.

A quick comment about the UI bug in the Locust screenshot. I believe it's caused by taking the screenshot with a browser width that is less than 1050 pixels (or around there). I think it's reasonable to think that most developers are on resolutions that are higher than that (though one could definitely argue that we should degrade more gracefully).

heyman commented 4 years ago

EDIT: Disregard this post as I had clearly misinterpreted the blog post.

I really think it would be much more fair to include an entry in the graphs of Locust running in distributed mode with one process per core. I thought that was your intention since @ragnarlonn wrote this in my PR:

I think the fair thing to do here, both for the test suite but also for the blog articles, is to test Locust in both single-instance and distributed mode.

I like that it's mentioned in the text, but I think the sad truth about benchmark articles is that a significant part of the readers mostly just look at the graphs. And I think those graphs are currently misleading when it comes to the result for Locust, since no one who cared about maximizing RPS would run Locust with a single process on a multi-core machine.

ragnarlonn commented 4 years ago

@heyman Uhm, well I did run Locust in distributed mode. Which the article text mentions a number of times, and there is also some numbers for non-distributed mode (~900 RPS, as opposed to ~3k RPS in distributed mode). Maybe you should read it before commenting? :)

About the UI bug I was on a 5k screen but I rarely run apps in full screen mode on this (desktop) computer. I think it's common that people keep multiple apps onscreen at the same time when they have large screens, and perhaps work more in full-screen mode when on a small laptop screen. But I should probably have figured out it was a window size issue and tried resizing... Feel free to comment the article!

heyman commented 4 years ago

Oh, that's embarrassing :flushed:. I did read the article but obviously not well enough. Maybe I expected a separate entry for multiprocess locust. I'm very sorry about that!

ragnarlonn commented 4 years ago

Well, I'm sorry about not being clear that I wasn't using the loadgentest repo this time. I simply did not have time to update it before the testing, so I just ran all the tests manually instead. This was probably my last big load testing tool review but I hope someone else in the k6 team takes over and hopefully updates (or rewrites maybe :) the loadgentest repo so it is kept up to date and includes new PRs like yours.

ppcano commented 4 years ago

Closed as commented at https://github.com/loadimpact/loadgentest/pull/3#issuecomment-609480099