locustio / locust

Write scalable load tests in plain Python 🚗💨
https://locust.cloud
MIT License
25.06k stars 3k forks source link

GRPC compatibility : Locust load test throws greenlet.GreenletExit exception on reaching test time limit #1676

Closed pranavgupta1234 closed 3 years ago

pranavgupta1234 commented 3 years ago

Describe the bug

Locust load test throws greenlet.GreenletExit exception on reaching test time limit while testing GRPC based services. To monkey patch GRPC with gevent I am using grpc's:

import grpc.experimental.gevent as grpc_gevent
grpc_gevent.init_gevent()

but no luck.

Expected behavior

Should gracefully stop all greenlets and generate the necessary outputs.

Actual behavior

Test keeps on generating load, does not stop, has to be stopped via CTRL+C Keyboard Interrupt. Thrown exception:

[2021-01-18 13:40:47,715] C02CD0BNMD6N/INFO/locust.main: Run time limit set to 30 seconds
[2021-01-18 13:40:47,715] C02CD0BNMD6N/INFO/locust.main: Starting Locust 1.4.1
[2021-01-18 13:40:47,716] C02CD0BNMD6N/DEBUG/locust.runners: Updating state to 'spawning', old state was 'ready'
[2021-01-18 13:40:47,716] C02CD0BNMD6N/INFO/locust.runners: Spawning 5 users at the rate 3 users/s (0 users already running)...

 a bunch of related stats generated .......

[2021-01-18 13:41:17,717] C02CD0BNMD6N/INFO/locust.main: Time limit reached. Stopping Locust.
[2021-01-18 13:41:17,717] C02CD0BNMD6N/DEBUG/locust.runners: Stopping all users
[2021-01-18 13:41:17,718] C02CD0BNMD6N/DEBUG/locust.runners: Updating state to 'cleanup', old state was 'running'
[2021-01-18 13:41:17,718] C02CD0BNMD6N/INFO/locust.runners: Stopping 5 users
[2021-01-18 13:41:17,718] C02CD0BNMD6N/DEBUG/locust.runners: Stopping Greenlet-0
[2021-01-18 13:41:17,718] C02CD0BNMD6N/DEBUG/locust.runners: Stopping Greenlet-1
[2021-01-18 13:41:17,718] C02CD0BNMD6N/DEBUG/locust.runners: Stopping Greenlet-2
[2021-01-18 13:41:17,718] C02CD0BNMD6N/DEBUG/locust.runners: Stopping Greenlet-3
[2021-01-18 13:41:17,718] C02CD0BNMD6N/DEBUG/locust.runners: Stopping Greenlet-4

Traceback (most recent call last):
  File "src/gevent/event.py", line 159, in gevent._gevent_cevent.Event.wait
  File "src/gevent/_abstract_linkable.py", line 437, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
  File "src/gevent/_abstract_linkable.py", line 403, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 406, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 358, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
  File "src/gevent/_abstract_linkable.py", line 367, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
  File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
greenlet.GreenletExit
Exception ignored in: 'grpc._cython.cygrpc.run_loop'

Steps to reproduce

I am using locust to test bunch of GRPC services and tasks involve bunch of GRPC calls. ### Environment - OS: MacOS Catalina 10.15.7 - Python version: 3.8 - Locust version: 1.4.1, grpcio version : 1.30.0 - Locust command line that you ran: locust --host myhost -f locustfile --headless -u 5 -r 3 -t 30s MyLocustClass
cyberw commented 3 years ago

GRPCIO uses asyncio, which is incompatible with gevent / greenlets, on which locust is built. I dont think this can be fixed (with anything less than a major rewrite of locust)

cyberw commented 3 years ago

Actually, it may be possible using something like https://github.com/2mf/aiogevent But you are on your own... (and that repo doesnt look like it is actively maintained)

pranavgupta1234 commented 3 years ago

Actually grpc has been patched for compatibility with gevent https://github.com/grpc/grpc/issues/4629, on peeking into source of locust it might be due to async calls used for stopping greenlets. I have been using locust with grpc previously.

cyberw commented 3 years ago

Ok, aha... yea that was added pretty recently to locust.

cyberw commented 3 years ago

Reopening this as it may be possible to fix after all.

Dont expect a huge amount of attention on it though :) If you can then please have a look at fixing it yourself.

cyberw commented 3 years ago

The asyncronous stopping of locusts is only really necessary when --stop-timeout is being used, so if we reverted that part to use syncronous calls then it might work for "regular" runs. See runners.py:250

Tstepro commented 3 years ago

Also seeing this error on distributed runners when running a locust cluster. Please let me know if there's something I could look at to help unblock this.

cyberw commented 3 years ago

Making the asyncrounous stopping of locusts a regular ”direct” kill if stop-timeout is zero at runners.py:250 should provide a workaround.

cyberw commented 3 years ago

Should be fixed now! (not yet in a release though, but it should become part of 1.5.2 when I make it)

barrachri commented 3 years ago

Still happening


Traceback (most recent call last):
--
  | File "src/gevent/event.py", line 163, in gevent._gevent_cevent.Event.wait
  | File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
  | File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  | File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  | File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
  | File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
  | File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  | File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  | File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
  | greenlet.GreenletExit
  | Exception ignored in: 'grpc._cython.cygrpc.run_loop'
barrachri commented 3 years ago
locust==1.5.3
greenlet==1.1.0
grpcio==1.37.1
grpcio-tools==1.37.1
cyberw commented 3 years ago

@beandrad Any idea?

beandrad commented 3 years ago

You should close the grpc channel before killing the user (greenlet); the example in the documentation shows how to do this.

barrachri commented 3 years ago

this part? https://github.com/locustio/locust/blob/94e16dda0aa7bc5735d5d2db13031bfe416a7ab1/examples/grpc/locustfile.py#L67

Because I modified my runner based on yours (the example that is part of the documentation).

beandrad commented 3 years ago

Yep, that's it. If the runner now uses the instance stop() method instead of the base class one, you should not see the issue above, since the grpc channel is closed before the greenlet is killed.

barrachri commented 3 years ago

Thanks @beandrad

This is my class, should be comparable to yours.

But I am running locust as a library, could that be the reason?

class GRPCLoadTest(User):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        if "https" in self.host:
            target = self.host.replace("https://", "")
            self.channel = grpc.secure_channel(
                target,
                credentials=grpc.ssl_channel_credentials(root_certificates=certs),
                options=GRPC_CLIENT_CHANNEL_OPTS,
            )
        else:
            target = self.host.replace("http://", "")
            self.channel = grpc.insecure_channel(
                target, options=GRPC_CLIENT_CHANNEL_OPTS
            )
        self._channel_closed = False
        self.stub = service_grpc.PredictStub(self.channel)

    def on_stop(self, force=False):
        self._channel_closed = True
        time.sleep(1)
        self.channel.close()
        super().stop(force=True)

    @stopwatch
    def _get_prediction(self, *args, **kwargs):
        requests = [service_pb.Request(), service_pb.Request(), service_pb.Request()]
        prediction_request = service_pb.PredictionRequest(requests=requests)
        self.stub.Evaluate(prediction_request)

    @task
    def loadtest_runner(self, *args, **kwargs):
        if not self._channel_closed:
            self._get_prediction()
        time.sleep(1)
beandrad commented 3 years ago

def on_stop(self, force=False): should be def stop(self, force=False):

barrachri commented 3 years ago

ouch 🤦

thanks @beandrad 🙌

I'll test it again and let you know!

barrachri commented 3 years ago

Thanks @beandrad, it does work 👍