locustio / locust

Write scalable load tests in plain Python 🚗💨
MIT License
24.64k stars 2.96k forks source link

Inconsistent stats resetting #299

Closed devoto13 closed 7 years ago

devoto13 commented 9 years ago

If I have requests inside on_start() method of my main task set, stats will be reseted before all requests are completed. As a result I get some of the initialization requests inside report and they clutter the data.

Relevant code: hatch(). I'm not closely familiar with gevent/greenlets, but issue seems to be related to greenlets nature. When request is sent from on_start() method it switches context back to hatch() and since all locusts are already hatched it emits hatch_complete event (but requests from on_start() are still running).

As a solution I see some kind of synchronization to let runner know, that locust initialization is done. Any ideas how to deal with this?

billibala commented 9 years ago

I run into similar issue as well. The # requests counter reset. In my case, I haven't override on_start() at all.

devoto13 commented 9 years ago

Stats resetting is intended behavior. See #91 for motivation.

sverhagen commented 9 years ago

You saying: "intended behavior", makes it sound like all what's described here is working as desired. Since you're the original poster I assume that's not what you mean. I agree totally with your original observation that hatch() is where this is mainly caused. But since on_start isn't called until in the TaskSet, it might be something to introduce a new event and send it through the events library. Then spawn_locusts/hatch can keep a list of started locust and wait for having received said event for each of them before proceeding to All locusts hatched. That would be a great fix. Where can I vote? :wink:

devoto13 commented 9 years ago

Sorry if it was confusing. What I meant is that it works as desired if you don't have any HTTP requests inside on_start().

I thought about events initially, but after further investigation discarded this idea (it is quite complex and is bad from the conceptual point of view). I implemented much simpler alternative for my current project.

If you think a bit about requests TaskSet.on_start() method usually contains (which you don't want to see in the final stats), you'll notice that they are mostly for user creating, uploading avatar, authorization, etc. E.g. requests related to user bootstrapping. Since one user in terms of this library is one locust, it makes sense to put such initialization inside Locust class instead of the TaskSet class. So I extended HttpLocust class to execute on_start() method if present the same as TaskSet:

class UserLocust(HttpLocust):
    def __init__(self):
        super(UserLocust, self).__init__()

        if hasattr(self, "on_start"):
            self.on_start()

Also I modified the hatch() method to fire hatch_complete event after UserLocust.on_start() is completed (including all requests inside it).

This approach is quite simple and doesn't break existing contract. So if maintainers is going to merge such change, I can provide a pull request.

sverhagen commented 9 years ago

People certainly have different use cases. In our system and its tests we're pre-providing users with a configuration file, and our on_start is more about getting seed data from our system (in an effort to keep the tests data-independent, no hard-coded identifiers that are pointing to stuff, so we try to get those identifiers during on_start). Since the needed seed data differs per task set, in our case the on_start is in fact better bound to the task set than it would be to Locust.

This is aggravated by each locust (potentially) running with different credentials, and the obtained seed data may be different accordingly. So our on_start is not specific to a task set class, but to a particular task set instance.