Inaccurate response time?

keithmork commented 7 years ago

locust: 0.8a2
MacBook Pro 2016
python 3.6.2

I ran some benchmarks with locust, wrk and ab, and locust always reports an average response time around 300ms, while the other 2 report 4ms. Same server, same settings, same bandwidth.

I assume the BIO nature of requests lib has no effect to the response time, right?

I know locust isn't for benchmark, but how can we trust its result if it can't even get the (remotely) correct stats of a 100B static page? For most APIs in most .com company, this extra 300ms is unacceptable.

wrk --latency -t 8 -c 200 -d 120s --timeout 10s http://192.168.3.221/nginx_status

Running 2m test @ http://192.168.3.221/nginx_status
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    53.94ms  178.14ms   3.28s    94.62%
    Req/Sec     4.37k   531.52     6.98k    69.51%
  Latency Distribution
     50%    4.05ms
     75%    4.86ms
     90%  161.98ms
     99%  741.21ms
  4175822 requests in 2.00m, 1.02GB read
Requests/sec:  34792.40
Transfer/sec:      8.74MB

ab -k -r -n 4000000 -c 200 -s 10 "http://192.168.3.221/nginx_status"

This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.3.221 (be patient)
Completed 400000 requests
Completed 800000 requests
Completed 1200000 requests
Completed 1600000 requests
Completed 2000000 requests
Completed 2400000 requests
Completed 2800000 requests
Completed 3200000 requests
Completed 3600000 requests
Completed 4000000 requests
Finished 4000000 requests

Server Software:        nginx/1.10.3
Server Hostname:        192.168.3.221
Server Port:            80

Document Path:          /nginx_status
Document Length:        115 bytes

Concurrency Level:      200
Time taken for tests:   117.530 seconds
Complete requests:      4000000
Failed requests:        9568
   (Connect: 0, Receive: 0, Length: 9568, Exceptions: 0)
Keep-Alive requests:    3960103
Total transferred:      1055810083 bytes
HTML transferred:       460009568 bytes
Requests per second:    34033.92 [#/sec] (mean)
Time per request:       5.876 [ms] (mean)
Time per request:       0.029 [ms] (mean, across all concurrent requests)
Transfer rate:          8772.79 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.9      0     216
Processing:     0    6  27.0      4    6547
Waiting:        0    6  27.0      4    6547
Total:          0    6  27.1      4    6547

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      5
  95%      6
  98%      9
  99%     12
 100%   6547 (longest request)

PYTHONPATH=. locust -f tests/test_server.py --no-reset-stats

"Method","Name","# requests","# failures","Median response time","Average response time","Min response time","Max response time","Average Content Size","Requests/s"
"GET","/nginx_status",60595,0,290,288,2,1735,114,424.08
"None","Total",60595,0,290,288,2,1735,114,424.08

"Name","# requests","50%","66%","75%","80%","90%","95%","98%","99%","100%"
"GET /nginx_status",60595,290,320,360,390,470,540,650,750,1735
"None Total",60595,290,320,360,390,470,540,650,750,1735

locustfile:

from locust import TaskSet, HttpLocust, task

class HelloWorld(TaskSet):
    @task
    def foo(self):
        self.client.get('/nginx_status')

class Run(HttpLocust):
    host = 'http://192.168.3.221'
    min_wait = 0
    max_wait = 0
    stop_timeout = 120

    task_set = HelloWorld

heyman commented 7 years ago

That definitely sounds strange. You should see response times very similar to ab and wrk.

I assume the BIO nature of requests lib has no effect to the response time, right?

Nope, gevent makes it non-blocking.

Is this regardless to how many Locust users you simulate? Do you still see response times ~300 ms if you only simulate <100 users?

keithmork commented 7 years ago

@heyman I ran the same script on a linux server today, here's the result:

NOT using master/slave mode:

users	RPS	avg. RT
1	~275	3ms
20	~620	20+ms
100	~610	100+ms

Seems totally unusable.

Using master/slave mode with 7 slaves on the same server(8-core cpu):

users	RPS	avg. RT
1	~275	3ms
2	~550	3ms
3	~830	3ms
20	~3900	4ms
50	~4100	9ms
...	...	...
1400	~3700	~170ms

Seems meaningless using more than 20 coroutines.

ab: （142487/s !!!）

ab -k -r -n 4000000 -c 200 "http://192.168.3.221/nginx_status"

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.3.221 (be patient)
Completed 400000 requests
Completed 800000 requests
Completed 1200000 requests
Completed 1600000 requests
Completed 2000000 requests
Completed 2400000 requests
Completed 2800000 requests
Completed 3200000 requests
Completed 3600000 requests
Completed 4000000 requests
Finished 4000000 requests

Server Software:        nginx/1.10.3
Server Hostname:        192.168.3.221
Server Port:            80

Document Path:          /nginx_status
Document Length:        114 bytes

Concurrency Level:      200
Time taken for tests:   28.073 seconds
Complete requests:      4000000
Failed requests:        4000005
   (Connect: 0, Receive: 0, Length: 4000005, Exceptions: 0)
Write errors:           0
Keep-Alive requests:    3960098
Total transferred:      1067920454 bytes
HTML transferred:       472118782 bytes
Requests per second:    142487.10 [#/sec] (mean)
Time per request:       1.404 [ms] (mean)
Time per request:       0.007 [ms] (mean, across all concurrent requests)
Transfer rate:          37149.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.1      0    1001
Processing:     0    1   5.2      1     577
Waiting:        0    1   5.2      1     577
Total:          0    1   5.3      1    1004

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      2
  90%      2
  95%      2
  98%      3
  99%      5
 100%   1004 (longest request)

And I also tested some real APIs:

API1

jmeter(400 threads): avg ~90ms, rps ~2000
locust(400 users, 7 slaves): avg ~580ms, rps ~500

API2:

jmeter(200 threads): avg ~140ms, rps ~1300
locust(200 users, no slave): avg ~1700ms, rps ~80

CentOS 6.9
python 3.6.3
locust 0.8, using zeromq

settings(both client & target server):

max open files 262144
tcp port range 1024-65535
tcp fin timeout 15s
tw_reuse=1

keithmork commented 7 years ago

with --no-web option:

PYTHONPATH=. locust -f tests/test_server.py --no-web --csv=test -c 10 -r 1 -t 120s

 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET /nginx_status                                              67273     0(0.00%)      11       4     296  |      12  621.40
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                          67273     0(0.00%)                                     621.40

[2017-10-12 20:02:12,491] xg-build/INFO/locust.main: Time limit reached. Stopping Locust.
[2017-10-12 20:02:12,499] xg-build/INFO/locust.main: Shutting down (exit code 0), bye.
 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET /nginx_status                                              68187     0(0.00%)      11       4     296  |      12  621.30
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                          68187     0(0.00%)                                     621.30

Percentage of the requests completed within given times
 Name                                                           # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
--------------------------------------------------------------------------------------------------------------------------------------------
 GET /nginx_status                                               68187     12     13     13     14     14     15     16     29    300
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                           68187     12     13     13     14     14     15     16     29    300

So gevent is buggy on not just windows but Mac and linux now? (Maybe it's time to embrace asyncio(and py35+) :P

keithmork commented 7 years ago

using a simple nodejs + express test server:

...

app.get('/random_sleep', (req, res) => {
    const min = Number(req.query.min) || 10;
    const max = Number(req.query.max) || 300;
    const sleep = Math.floor(Math.random() * (max - min + 1)) + min;

    const respond = () => {
        res.send(`Min: ${min}, Max: ${max}, Sleep: ${sleep}`);
    };
    setTimeout(respond, sleep);
});

...

locust

PYTHONPATH=. locust -f tests/test_server.py --no-web -c 200 -r 10 -t 120s

[2017-10-12 21:11:49,586] xg-build/INFO/locust.main: Time limit reached. Stopping Locust.
[2017-10-12 21:11:49,746] xg-build/INFO/locust.main: Shutting down (exit code 0), bye.
 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
--------------------------------------------------------------------------------------------------------------------------------------------
 GET /random_sleep?min=100&max=100                              54186     0(0.00%)     272     125     529  |     270  576.20
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                          54186     0(0.00%)                                     576.20

Percentage of the requests completed within given times
 Name                                                           # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
--------------------------------------------------------------------------------------------------------------------------------------------
 GET /random_sleep?min=100&max=100                               54186    270    300    310    310    330    350    380    400    530
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                           54186    270    300    310    310    330    350    380    400    530

jmeter with 200 threads (avg 131ms, rps 1508)

Label,# Samples,Average,Median,90% Line,95% Line,99% Line,Min,Max,Error %,Throughput,Received KB/sec,Sent KB/sec
HTTP Request,181222,131,130,152,158,189,101,282,0.000%,1508.96359,393.45,434.71
TOTAL,181222,131,130,152,158,189,101,282,0.000%,1508.96359,393.45,434.71

cgoldberg commented 6 years ago

there is nothing actionable in this ticket.. closing.

lgrabowski commented 6 years ago

hello I have the same issue, I used extra vegeta as another http bench tool and I have the same results as @keithmork. All of them ( I used the same tools and @keithmork ) returns much lower time for very simple app (tornado + hello world) and go lang (same implementation). Locust returns like 1.5 sec for Simple hello world (?!?). Can You take a look on this ?

cgoldberg commented 6 years ago

Locust returns like 1.5 sec for Simple hello world

are you setting a min_wait and max_wait?

min_wait and max_wait defaults to 1000, and therefore a locust will always wait 1 second between each task if min_wait and max_wait is not declared.

lgrabowski commented 6 years ago

yes I know, I have set them to different values most between 1 and 5.

vegeta (similar results as for wrk, ab):

Bucket           #     %       Histogram
[0s,     20ms]   1091  24.24%  ##################
[20ms,   30ms]   2880  64.00%  ################################################
[30ms,   60ms]   489   10.87%  ########
[60ms,   90ms]   18    0.40%
[90ms,   120ms]  14    0.31%
[120ms,  150ms]  5     0.11%
[150ms,  180ms]  0     0.00%
[180ms,  250ms]  3     0.07%
[250ms,  300ms]  0     0.00%
[300ms,  500ms]  0     0.00%
[500ms,  600ms]  0     0.00%
[600ms,  700ms]  0     0.00%

Percentage of the requests completed within given times
 Name                          # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
-----------------------------------------------------------------------------------
 GET /ping                 7    630    880   1400   1400   1500   1500   1500   1500   1538
-----------------------------------------------------------------------------------

kishoregorapalli commented 8 months ago

I tried to do bench marking of the locust response times with enterprise tools Strangely for the same requests the response times are huge for Locust and could achieve 180 RPS for the same test where as with Enterprise tool it is around 264 RPS I tried FasttHTTPUser too

If any one find the same behavior , please comment and if any one resolved this type of issue let me know . Thanks in advance

cyberw commented 8 months ago

Hi @kishoregorapalli ! There could be a number of reasons for this.

Assuming you didnt run into issues regarding CPU usage (shouldnt be an issue at such low load, but check your logs for CPU utilization warnings just in case) one thing that might differ between tools is how connections are handled (typically locust users will reuse connections, and sometimes load balancers/firewalls cause issues with this).

But the first thing you should do is ensure you have enough processes. The easiest thing you can do (assuming you are on a recent locust version) is just add --processes -1 to launch one process per core on your machine.

If you still have issues, please file a new ticket with complete logs, locustfile, locust version etc.

locustio / locust

Inaccurate response time? #663