locustio / locust

Write scalable load tests in plain Python 🚗💨
https://locust.cloud
MIT License
25.06k stars 3k forks source link

incompatibility with the tenacity retry library #1652

Closed Apteryks closed 1 week ago

Apteryks commented 3 years ago

Hello!

Given this problem is about some interaction between Locust and an external library (tenacity), I am not sure if this is a bug or a feature request. Tenacity is a maintained fork of the classic "retry" library that provides decorators and procedures to easily retry procedures. It is useful in conjunction with Locust to conveniently retry requests with exponential backoff, for example (see: https://tenacity.readthedocs.io/en/latest/#retrying-code-block).

Describe the bug

When using a tenacity retry decorator on any method in a user class (HttpUser or FastHttpUser), Locust hangs with 100% CPU usage. strace says it's looping on epoll_wait system calls.

Expected behavior

Locust should not hang and the retry decorator should allow procedures to be retried.

Actual behavior

Locust hangs and fully use a CPU core.

Steps to reproduce

I'm using Docker to run Locust.

file: Dockerfile:

FROM locustio/locust
RUN pip3 install tenacity

file: repro.py

import logging

from locust import HttpUser, task
from tenacity import retry, stop_after_attempt, wait_exponential

class User(HttpUser):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.retry_count = 0

    @retry(stop_after_attempt(5),
           wait=wait_exponential(multiplier=1, min=4, max=10))
    def on_start(self):
        if self.retry_count < 3:
            self.retry_count += 1
            raise Exception('oops!')
        logging.info('on_start succeeded!')

    @task
    def main(self):
        logging.info('Running main task...')

Build a derived Locust image which has tenacity:

docker build . -t locust/with-tenacity

Then run the reproducer like:

docker run -v "$PWD:/mnt/locust" -w "/mnt/locust" locust/with-tenacity -f repro.py --headless --host http://www.example.com

Observe the hang with high CPU usage.

Environment

Dockerized. Everything needed to reproduce is described above.

mboutet commented 3 years ago

Perhaps using monkey.patch_all(aggressive=True) instead of monkey.patch_all() would do the trick. See http://www.gevent.org/api/gevent.monkey.html#gevent.monkey.patch_select

Apteryks commented 3 years ago

A workaround, as suggested by Maxence here: https://locustio.slack.com/archives/C3NUJ61DJ/p1606774902427200, code blocks rather than methods or procedures can be retried instead (see: https://tenacity.readthedocs.io/en/latest/#retrying-code-block).

Here's an example:

import logging 

from locust.exception import RescheduleTask
from tenacity import RetryError
from tenacity import Retrying, stop_after_attempt, wait_exponential

def default_retry():
    """Return a default exponential back-off Tenacity retry object."""
    return Retrying(stop=stop_after_attempt(config['retry_attempts']),
                    wait=wait_exponential(multiplier=1, min=4, max=10))

def login(user):
    try:
        for attempt in default_retry():
            with attempt:
                response = user.client.get("/index.php")
                # Do something with the response.
    except RetryError:
        logging.error('Failed to process the home page response.')
        raise RescheduleTask

It's not as elegant, though.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 10 days with no activity.

Lef-F commented 2 weeks ago

This is still an issue, but only when spinning up locust with the Web UI. Headless locust works with tenacity.

Is there any chance it will be followed up? 🙏

cyberw commented 2 weeks ago

I dont have time to look at it myself, but would welcome a PR.

github-actions[bot] commented 1 week ago

This issue was closed because it has been stalled for 10 days with no activity. This does not necessarily mean that the issue is bad, but it most likely means that nobody is willing to take the time to fix it. If you have found Locust useful, then consider contributing a fix yourself!