Skeptical about controversial results

doanguyen commented 5 years ago

Hi, I had some time to check the benchmark of Laravel and Flask on my computer i7-7600U, 2.8x4 GHz and here is the result: https://gist.github.com/doanguyen/a81cccb4e884c63422e2c5ae3e846d18

It shows the total different outcomes, Laravel serves only 87 req/s and Flask at best is 8506 req/s, I did some research and it looks similar to some other benchmarks, for instance, https://medium.com/@mihaigeorge.c/web-rest-api-benchmark-on-a-real-life-application-ebb743a5d7a3

Do you have any clue what's happening?

waghanza commented 5 years ago

Hi @doanguyen,

I have not done any benchmark before, but honestly I doubt that we have such differences. In my php adventures I have often to do some configurations that I have NOT to do in python world.

I am currently working on cloudification (on aws) for ruby based frameworks. If you want to make the same for php based frameworks, I'll be glad to push my code and start collaborating with you :heart:

The only way, for me, to be sure of performance is to have an access to the source code / details about technical environments (what I plan to deliver here) :stuck_out_tongue:

waghanza commented 5 years ago

PS : The time I was setting up php here, I had such diff when I was using tcp instead of socket in php

doanguyen commented 5 years ago

Hi @waghanza,

Thanks for fast replying. I'm not sure you are mentioning about tcp socket and unix socket, if that, I think it doesn't make so much different. From what I know about Laravel, It is rather slow in comparison with others so maybe I missed something in my configuration that makes the results are so different.

waghanza commented 5 years ago

Do you want to help us running php on cloud ?

I'm results won't be as imagined 😊

doanguyen commented 5 years ago

Sure, I'm happy to do, but how?

waghanza commented 5 years ago

Do you know cloud-init ?

The technical system I use for ruby is some definitions (global, language level and framework level) to define who to setup host. I'll push my code soon and explain how to use it here

doanguyen commented 5 years ago

I've never done it before, the setup that I've made for the benchmark is creating two docker instances and manually benchmark by apache ab.

waghanza commented 5 years ago

I understand, but using docker is NOT representative of production-ready (something that produce readable results from which assumptions could be done)

I mean using docker (I think on a local workstation) is NOT the same as :

using a production env (even on kubernetes)
using a server (any production grade machine is NOT on the same arch that our workstation)
ab COULD be useful but we are using wrk here on purpose (as tfb)

Please, be patient :stuck_out_tongue: , I'll keep this issue open while I'll push my cloudify branch so as we can work on both ruby and php together :heart:

doanguyen commented 5 years ago

Hi again,

Really bad news.

I still questioning myself why there is so much different in the result and I finally found out.

I've never written any crystal code so please correct me if I'm wrong. The file tools/src/benchmarker.cr will send, by default, 100k requests with no_cpu+1 threads to the target and collect the result by averaging the 3 endpoints. The problem is, the script does NOT check the return value and Nginx will serve all the inbound connections, forward it to the php process manager without knowing if php is being ready or not. With said, all the incoming request will be served, either return 2xx/3xx response code OR 502 code. So I tested with wrk, and not surprise:

$ wrk -c 1000 -d 10s -t 3 http://172.17.0.3:3000/user/0 -v           
wrk 4.1.0 [epoll] Copyright (C) 2012 Will Glozer
Running 10s test @ http://172.17.0.3:3000/user/0
  3 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    60.58ms  205.00ms   2.00s    93.94%
    Req/Sec    16.58k     7.47k   43.94k    67.68%
  490950 requests in 10.03s, 154.55MB read
  Socket errors: connect 0, read 224, write 0, timeout 233
  Non-2xx or 3xx responses: 490720
Requests/sec:  48958.45
Transfer/sec:     15.41MB

Basically, I sent 100k req, with 3 thread and php rejected >99% of the requests and nginx just return non-2xx or 3xx response. Reducing the connection, I get:

$ wrk -c 10 -d 5s -t 3 http://172.17.0.3:3000/user/0 -v
wrk 4.1.0 [epoll] Copyright (C) 2012 Will Glozer
Running 5s test @ http://172.17.0.3:3000/user/0
  3 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   133.51ms   32.33ms 239.26ms   76.28%
    Req/Sec    22.51      7.53    40.00     82.64%
  333 requests in 5.01s, 201.97KB read
Requests/sec:     66.44
Transfer/sec:     40.30KB

Now it make perfect sense and very well aligned with other benchmarks.

I know you guys working so hard to perform this benchmark, so please fix it before we have results that can be referenced.

waghanza commented 5 years ago

@doanguyen As I understand, your point is that non http 200 are not checker.

I know the code needs refactoring, but kits check the output (json) of bin/client -u http://google.fr

Error (even socket ones) are taking in count

waghanza commented 5 years ago

@doanguyen however, the errors are not shown on results, we have to choose the right way to display errors

doanguyen commented 5 years ago

It's not about displaying the erros, if it's error then people still thinking you are benchmarking php while in principle, you are benchmarking Nginx. I'm certain no where on earth can get >30k reqs with laravel on single instance, just impossible

waghanza commented 5 years ago

@doanguyen you're concern is aboutphp then.

As I have already said, I have not found anything like gunicorn or puma in php word.

If you know one, feel free to contribute :heart :

Nginx + PHP-FPM is a choice only made because I do not know any appserver on php

doanguyen commented 5 years ago

It's not about server stack, it's about the way you benchmark, collecting results, I think it needs to fix first

waghanza commented 5 years ago

fix what ?

as the README is saying, results are not production-ready (then you can not take any assumptions)

I hope at the end of the year, results will be computed on cloud (so production-ready) and then we can make assumptions

Honestly, results from a local docker and on an isolated machine are NOT the same

proyb6 commented 5 years ago

Yes, we need to fix the issues, let’s pull a request if you can contribute.

waghanza commented 5 years ago

sure, anyone could have global knowledge, just name the problem and contribute with a solution :+1:

(ideas, PR ...)

doanguyen commented 5 years ago

@waghanza with your response I really want to ask you what is benchmark stand for?

With that it will come to the point of how to do it properly.

Until now, I have an only recommendation is to use wrk instead of a custom library to perform the benchmark, but the library could be your selling point so either you stick to your solution or move to wrk.

waghanza commented 5 years ago

@doanguyen ah understood, no need to point issues based on assumptions for that (because results are not production-ready) ^^

wrk is used since since june https://github.com/the-benchmarker/web-frameworks/pull/223

I know anything is perfect, but we are trying to do so :heart:

and when we reach production level, we will start a documentation to have more details about implementation, anything is stable as for now, so a documentation will not be helpful

doanguyen commented 5 years ago

I followed the README, not so correct. I'll close this issue.

waghanza commented 5 years ago

@doanguyen So README was not clear enough for you ?

doanguyen commented 5 years ago

Isn't it wrong? still using the old benchmarker crystal script?

waghanza commented 5 years ago

no, it is NOT wrong

This script, is in fact a custom wrapper around wrk

However, considering it unclear is OK

If you wan to clarify the situation, would you mind list me the questions you are asking yourself, I'll try to maintain a wiki

doanguyen commented 5 years ago

Whatever it is, it's fine by me now.

Aside, https://github.com/giltene/wrk2 provides additional features and DON'T have the problem of taking non-2xx into account of the result.

Is this worth to make the change?

waghanza commented 5 years ago

Yeah, I've created an issue for this https://github.com/the-benchmarker/web-frameworks/pull/183, there is a lot of alternative to wrk

We decided to go on wrk, but it COULD change

feel free to open a separate discussion to ask for sieger tool change

doanguyen commented 5 years ago

Thanks, @waghanza, things are clear now.

waghanza commented 5 years ago

@doanguyen I have successfully run on aws instances by creating a wrapper here (I'll push soon in my fork)

for sinatra, I have (times are in ms)

Instance	Average	50th percentile	90th percentile	99th percentile	99.9th percentile	Standard deviation
t3.nano (2 cores)	82714.0	82047.0	84024.0	88330.0	135806.0	2177.0
c5.2xlarge (8 cores)	85871.0	85187.0	88529.0	93479.0	111095.0	2052.0

:warning: example after having proper results like that we can make asumptions, like running c5.2xlarge are useless ...

PS : The above results are for 1 endpoint only -> /

doanguyen commented 5 years ago

It's a bit weird to me. Any chance the wrk instance is the bottleneck? :laughing:

waghanza commented 5 years ago

yeah :stuck_out_tongue_closed_eyes:

this is phase 1

I mean wrk is on my local computer, so with a non-stable conection (even no wireless)

In phase 2, I'll run on a fully stable connection :stuck_out_tongue: cloud -> cloud instead of home -> ... -> cloud

OvermindDL1 commented 5 years ago

I mean wrk is on my local computer, so with a non-stable conection (even no wireless)

Lol, yeah the remote connection means you are measuring your possible network load as a maximum rather than the servers once hitting a certain point. ^.^;

waghanza commented 5 years ago

@doanguyen @OvermindDL1 I close this issue, stay tuned on gitter (or here) on cloudification :stuck_out_tongue:

the-benchmarker / web-frameworks

Skeptical about controversial results #421