Pinging and benchmarking utilities

tkrajca commented 8 years ago

Hi guys, a while ago, I wrote two utitlities - ping.py and benchmark.py - on top of locustio==0.7.1.

ping.py - this one works much like regular ICMP ping but pings http or tcp services as defined in locustio. So it takes a locust and then runs as 1 concurrent user making 1s spaced requests, dumping some stats at the end. This is useful to verify HA when testing various failover scenarios.

Example:

$ ping.py example.com
PING https://example.com
11921 bytes from `GET /beta/login': seq=0 time=1016ms
2693 bytes from `GET /beta/static/index.html': seq=1 time=275ms
42 bytes from `GET /beta/userkeys': seq=2 time=290ms
16544 bytes from `GET /beta/p/api/v2/deployments/thisoneworks/battery/historical/soc': seq=3 time=931ms
14069 bytes from `GET /beta/p/api/v2/deployments/thisoneworks/generation/historical/p': seq=4 time=776ms
19417 bytes from `GET /beta/p/api/v2/deployments/thisoneworks/house/historical': seq=5 time=1211ms
15927 bytes from `GET /beta/p/api/v2/deployments/thisoneworks/battery/historical/soc': seq=6 time=1003ms
^C
------- Statistics -------
7 requests sent, 7 received, 0% requests failed
$

benchmark.py - this one tries to estimate spare resource capacity. It basically starts by simulating a certain number of concurrent users and then works its way up (by small increments) until it hits a threshold. The threshold is defined by failing requests. Any failure stops the benchmark so that services don’t get totally hammered. I intentionally set short timeouts and stop on the first failure so that as soon as a benchmarked service gets busy, the benchmark stops. This is a way to automate running locust with different levels of concurrency.

Example:

$ benchmark.py example.com 7
STARTING BENCHMARK AGAINST https://example.com

Low: 7, Num_clients: 7, Num_requests: 662
339 INFO locust.runners: Hatching and swarming 7 clients at the rate 2
clients/s...
3846 INFO locust.runners: All locusts hatched: BenchmarkUILocust: 7
3846 INFO locust.runners: Resetting stats

Request failure: (), {'request_type': 'GET', 'exception': HTTPError('500 Server Error: INTERNAL SERVER ERROR',), 'response_time': 1322, 'name': '/beta/p/api/v2/deployments/thisoneworks/cost/historical'}
Request failure: (), {'request_type': 'GET', 'exception': HTTPError('500 Server Error: INTERNAL SERVER ERROR',), 'response_time': 807, 'name': '/beta/p/api/v2/deployments/thisoneworks/cost/historical'}
...

 Aggregated 80% Median                                             563                        1300
 Aggregated avg response time                                     1048
 Aggregated # Failures                                              99
 (sub)Benchmark ran for 415s
====== FINISHED Benchmark against https://example.com =======
Stats: 3 iterations, total time 418s, low threshold 18 clients

We use these scripts internally on top of locust.io and are happy to work with the community on incorporating them into locust.io. Is this something that you would be interested in doing/having part of locust.io?

Cheers, Tomas

ping.py:

#!/usr/bin/env python

from stress.opentsdb_locust import ReadAndWriteTSDB
from stress.stsdb_locust import ReadAndWriteSTSDB
from stress.BetaUI_locust import UI_data
from locust import HttpLocust, events
from locust.log import setup_logging, console_logger

import sys
import signal

def print_stats(signum, sigframe):
    #3 packets transmitted, 0 received, 100% packet loss, time 2000ms
    global count
    global failures
    global pinger
    pinger.stop_timeout = 0     # this should stop the pinger locust :)
    console_logger.info("")
    console_logger.info("------- Statistics -------")
    console_logger.info(
        "{0} requests sent, {1} received, {2}% requests failed".format(
            count, count - failures, int(failures * 100 / count))
    )

signal.signal(signal.SIGINT, print_stats)
signal.signal(signal.SIGABRT, print_stats)
signal.signal(signal.SIGTERM, print_stats)
signal.signal(signal.SIGQUIT, print_stats)

count = 0
failures = 0
pinger = None

def ping_success(request_type, name, response_time, response_length):
    global count
    console_logger.info("{0} bytes from `{1} {2}': seq={3} time={4}ms".format(
        response_length, request_type, name, count, response_time))
    count += 1

def ping_failure(request_type, name, response_time, exception):
    global count
    global failures
    console_logger.error("ERROR From `{0} {1}' seq={2} time={3}ms {4}".format(
        request_type, name, count, response_time, exception))
    console_logger.error((request_type, name, response_time, exception))
    count += 1
    failures += 1

events.request_success += ping_success
events.request_failure += ping_failure

class ReadAndWriteTSDBPinger(ReadAndWriteTSDB):
    timeout = 3

class ReadAndWriteSTSDBPinger(ReadAndWriteSTSDB):
    timeout = 3

class OpentsdbPinger(HttpLocust):
    min_wait = 1000
    max_wait = 1000

    def __init__(self, host, *args, **kwargs):
        if "todo1" in host:
            self.host = host
            if ":" not in self.host:
                self.host = "{0}:4242".format(self.host)
            self.task_set = ReadAndWriteSTSDBPinger
        elif "todo2" in host:
            self.host = "http://{0}".format(host)
            self.task_set = ReadAndWriteTSDBPinger
        elif "todo3" in host:
            self.host = "https://{0}".format(host)
            self.task_set = UI_data
        else:
            raise Exception("Invalid host: {0}".format(host))

        console_logger.info("PING {0}".format(self.host))
        super(OpentsdbPinger, self).__init__(*args, **kwargs)

if __name__ == "__main__":
    setup_logging('INFO', None)
    if len(sys.argv) != 2:
        sys.stderr.write("Usage: {0} <hostname>\n".format(sys.argv[0]))
        quit(1)
    else:
        host = sys.argv[1]

    pinger = OpentsdbPinger(host)
    pinger.run()

benchmark.py:

#!/usr/bin/env python

from locust import runners, events
from locust.log import setup_logging, console_logger
from locust.stats import print_stats, print_percentile_stats, STATS_NAME_WIDTH

from stress.opentsdb_locust import OpentsdbLocust
from stress.stsdb_locust import STSDBLocust
from stress.BetaUI_locust import UILocust

import time
import sys

THRESHOLD = 5000    # 5 seconds
BENCH_PERIOD = 123
INCREMENT = 4

def shutdown(*args, **kwargs):
    events.quitting.fire()

def log_failure(*args, **kwargs):
    console_logger.error("Request failure: {0}, {1}".format(args, kwargs))

def set_time(user_count):
    global t0
    t0 = time.time()

t0 = time.time()

# shut this benchmark down on its first failure - we don't tolerate failures :)
events.request_failure += log_failure
events.request_failure += shutdown
events.locust_error += log_failure
events.locust_error += shutdown

events.hatch_complete += set_time

class BenchmarkOpentsdbLocust(OpentsdbLocust):
    min_wait = 900
    max_wait = 1000

class BenchmarkSTSDBLocust(STSDBLocust):
    min_wait = 900
    max_wait = 1000

class BenchmarkUILocust(UILocust):
    min_wait = 1234
    max_wait = 5432

class Options(object):
    def __init__(self, num_clients=None, num_requests=None, host=None,
                 hatch_rate=None):
        self.num_clients = num_clients
        self.num_requests = num_requests
        self.host = host
        self.hatch_rate = hatch_rate

if __name__ == "__main__":
    setup_logging('WARN', None)
    tt = time.time()
    tpl = " %-" + str(STATS_NAME_WIDTH) + "s %8d                      %6d"
    tpl2 = " %-" + str(STATS_NAME_WIDTH) + "s %8d"

    # initial benchmark values (bogus :)
    num_clients = 37

    # command line arguments
    if len(sys.argv) < 2:
        sys.stderr.write(
            "Usage: {0} <hostname> [<init_num_clients:{1}\n".format(
                sys.argv[0], num_clients))
        quit(1)
    else:
        host = sys.argv[1]
        if len(sys.argv) == 3:
            num_clients = int(sys.argv[2])

    num_requests = int(BENCH_PERIOD * num_clients / ((950 + 350) / 1000.0))

    # determine which locust to run
    if "todo" in host:
        if ":" not in host:
            host = "{0}:4242".format(host)
        BenchmarkLocust = BenchmarkSTSDBLocust
    elif "todo1" in host:
        host = "http://{0}".format(host)
        BenchmarkLocust = BenchmarkOpentsdbLocust
    elif "todo2" in host:
        host = "https://{0}".format(host)
        BenchmarkLocust = BenchmarkUILocust
    else:
        raise Exception("Invalid host: {0}".format(host))

    console_logger.warn("STARTING BENCHMARK AGAINST {0}".format(host))
    console_logger.warn("")

    low = num_clients
    i = 0

    # run :)
    while True:
        i += 1
        console_logger.warn(
            "Low: {0}, Num_clients: {1}, Num_requests: {2}".format(
                low, num_clients, num_requests)
        )
        hatch_rate = int(num_clients / 3) < 50 and int(num_clients / 3) or 50
        options = Options(num_clients=num_clients, num_requests=num_requests,
                          host=host, hatch_rate=hatch_rate)
        runners.locust_runner = runners.LocalLocustRunner(
            [BenchmarkLocust], options)
        # start the (sub)benchmark
        runners.locust_runner.start_hatching(wait=True)
        main_greenlet = runners.locust_runner.greenlet
        main_greenlet.join()

        # print stats
        print_stats(runners.locust_runner.request_stats)
        print_percentile_stats(runners.locust_runner.request_stats)
        agg_stats = runners.locust_runner.stats.aggregated_stats(
            full_request_history=True)
        agg_resp_time_80_median = agg_stats.get_response_time_percentile(0.80)
        agg_avg_time = agg_stats.avg_response_time
        if agg_resp_time_80_median is not None:
            console_logger.warn(tpl % (
                "Aggregated 80% Median", agg_stats.num_requests,
                agg_resp_time_80_median)
            )

        console_logger.warn(tpl2 % (
            "Aggregated avg response time", agg_avg_time)
        )
        console_logger.warn(tpl2 % (
            "Aggregated # Failures", agg_stats.num_failures)
        )
        console_logger.warn(" (sub)Benchmark ran for {0}s".format(
            int(time.time() - t0)))

        # set up for a new run - increment
        if (agg_resp_time_80_median is None or agg_stats.num_failures > 0 or
                agg_resp_time_80_median > THRESHOLD):
            break
        else:
            low = num_clients
            avg_time_fix = 2.3  # :)

        num_clients += INCREMENT
        # BENCH_PERIOD ~ (wait + avg_req_time_in_s)*num_requests/num_clients
        num_requests = int(BENCH_PERIOD * num_clients /
                           ((950 + (agg_avg_time * avg_time_fix)) / 1000.0))
        console_logger.warn("=================")
        console_logger.warn("")
        time.sleep(5.678)

# print final stats
console_logger.warn("====== FINISHED Benchmark against {0} =======".format(
    host)
)
console_logger.warn(
    "Stats: {0} iterations, total time {1}s, low threshold {2} clients".format(
        i, int(time.time() - tt), low)
)
console_logger.warn("===================")

cgoldberg commented 8 years ago

@tkrajca ,

maintenance and development of Locust is currently in a bit of flux, so new code has not been merged recently. However, the situation should hopefully get fixed up soon.

We use these scripts internally on top of locust.io and are happy to work with the community on incorporating them into locust.io. Is this something that you would be interested in doing/having part of locust.io?

so, In the meantime, I would just go ahead and create the Pull Request with your proposed changes. (pasting code here in the issue tracker isn't really helpful) Once your changes are pushed to your repo and things are in a PR format, the code can be discussed and reviewed. Once contributions get flowing again, maintainers can triage all the open PR's and decide what to merge. You can link to your PR from this issue and remove the code snippets you pasted to make things more clear. (if you do decide to craft a PR, feel free to tag me on it for code review)

thanks!

cgoldberg commented 8 years ago

I'm -1 on this.

Is this something that you would be interested in doing/having part of locust.io?

I don't think either utility is appropriate to include with locust.

I am closing this issue. If you want to re-open it, please submit a pull request for review... other reviewers might feel differently.

locustio / locust

Pinging and benchmarking utilities #387