Rename and restructure Locust/TaskSet

cyberw commented 4 years ago

Some of locusts core concepts have a naming style that makes it unnecessarily confusing. Also, some of the code design doesn't make a lot of sense.

I'd like to:

Merge the Locust and TaskSet classes into one. Separating the two never made sense to me. The user's Locust class almost never contains a significant number of things, and the things that ARE there are mostly things that should actually be set via command line parameters/config file "orthogonally" to the user behaviour (like wait times & target host). Another benefit of this is that it will make "the most basic locust script you could ever write" a lot shorter & easier to understand, so onboarding new users will be easier.
Rename this Locust/TaskSet to User
Change some command line parameters to use industry-standard names (avoid stuff like "hatch rate". Maybe this is also the time to drop the master/slave terminology)

Of course these things should be done carefully and, if possible, leaving the old functionality in place but marked deprecated for a while. I'm not sure if I'll ever have time to implement any of this, but I'd like to talk about it :P

heyman commented 4 years ago

Merge the Locust and TaskSet classes into one

I'm not in favor of merging them. I think they are conceptually different. In the beginning of Locust's development TaskSet didn't exist. Instead tasks were declared directly on the Locust class, and one could nest SubLocust classes. I think the introduction of the TaskSet class made it much better.

The user's Locust class almost never contains a significant number of things

For most larger test codebases I've worked on the Locust classes has contained a bunch of stuff. Almost always wrapper methods for making HTTP requests, and often some state for the simulated users.

Rename this Locust/TaskSet to User

I wouldn't mind renaming Locust to User.

Change some command line parameters to use industry-standard names (avoid stuff like "hatch rate". Maybe this is also the time to drop the master/slave terminology)

Sounds like a good idea!

cyberw commented 4 years ago

Merge the Locust and TaskSet classes into one

I'm not in favor of merging them. I think they are conceptually different. In the beginning of Locust's development TaskSet didn't exist. Instead tasks were declared directly on the Locust class, and one could nest SubLocust classes. I think the introduction of the TaskSet class made it much better.

To me, a TaskSet represents a user behaviour (or "scenario" if you will). What does a Locust represent? (it isnt really a "user", it just contains a user/scenario)

The user's Locust class almost never contains a significant number of things

For most larger test codebases I've worked on the Locust classes has contained a bunch of stuff. Almost always wrapper methods for making HTTP requests, and often some state for the simulated users.

Oh. I just put helpers & state directly in the TaskSet. Hmm... I guess if you put helpers in the Locust class you can inherit them without restricting the type of TaskSet you can inherit from etc. Are there any other differences/benefit of having them in the Locust? It seems a bit weird to separate actual tasks from helper methods into different classes (because they are almost the same thing, and a helper might even want to call a task sometimes), but maybe that is just me.

Rename this Locust/TaskSet to User

I wouldn't mind renaming Locust to User.

Cool!

I guess another way of scratching my itch would be to make TaskSets optional somehow (but without having to declare an internal class, which I think you suggested once before). Like putting @task method declarations directly on a Locust or something (and having that be the "normal" use case, and separately defined TaskSets be the "advanced" case)

cyberw commented 4 years ago

I guess what I'm trying to say is that there are two main things that define a load test.

The behaviour of the users
The load profile (how fast users run, how many of each user type, for how long, rampup etc)

I think it makes a lot of sense to make these the primary things we work with in locust. Right now they dont have a clear object/class representation, and the ones we have (Locust and TaskSet) are not really orthogonal, especially if we put user behaviour (helper methods) in the Locust.

I think a Locust (in the current implementation) is more like a "thread" or "TaskSet runner". Something that can perform/trigger user tasks, but isnt a user itself. Renaming it "User" wouldnt really fix that.

thehme commented 4 years ago

+1 on using industry-standard names

heyman commented 4 years ago

To me, a TaskSet represents a user behaviour (or "scenario" if you will). What does a Locust represent? (it isnt really a "user", it just contains a user/scenario)

A Locust represents a simulated user while a TaskSet can represent the user behaviour, or part of the user behaviour when TaskSets are nested (e.g. a sub section of a web app).

I just put helpers & state directly in the TaskSet. Hmm... I guess if you put helpers in the Locust class you can inherit them without restricting the type of TaskSet you can inherit from etc. Are there any other differences/benefit of having them in the Locust?

If you have state or helpers that are global to the simulated user, and you use nested TaskSets, you can always access that state, or those helpers, using self.locust.my_state. (I guess this could be partly solved by introducing a root attribute on TaskSet instances).

Overall I'd say that the separation of Locust and TaskSet makes sense when you consider cases with nested TaskSets. Also, if we rename the Locust class to User, it'd feel very weird to nest User classes under other User classes.

I guess another way of scratching my itch would be to make TaskSets optional somehow

I do remember considering this when introducing TaskSet (previously tasks could be declared directly under Locust and SubLocust). The only benefit from this would be that the minimal locust example would be slightly smaller (for any real world scenario, having to declare one User class on a few lines shouldn't matter). I can see two drawbacks which I think exceeds the benefit:

It feels like a violation of the There should be one-- and preferably only one --obvious way to do it.
It would complicate the implementation a lot. I imagine that we'd construct a TaskSet dynamically under the hood (because the alternative - to bring TaskSet logic into User/Locust - seems like a worse idea) which would make the code more "magic", and the developer would know less about the environment that their code would be running in.

I think a Locust (in the current implementation) is more like a "thread" or "TaskSet runner". Something that can perform/trigger user tasks, but isnt a user itself. Renaming it "User" wouldnt really fix that.

I don't understand why you don't think Locust/User in the current implementation doesn't represent a user. To me it's exactly what it represents. It is run in a separate thread/greenlet, yes, but that's what I'd expect (how could it represent a simulated user without being run a separate thread?).

cyberw commented 4 years ago

To me, a TaskSet represents a user behaviour (or "scenario" if you will). What does a Locust represent? (it isnt really a "user", it just contains a user/scenario)

A Locust represents a simulated user while a TaskSet can represent the user behaviour, or part of the user behaviour when TaskSets are nested (e.g. a sub section of a web app).

Ok, I think I understand your thinking now. The Locust represents a "client", not really a "user" (because a user, in my parlance, has more than just state and utility functions, it also has an objective or task that it wants to perform).

A client could be dumb and just have an http client (plain HttpLocust) or contain utility functions specific to your site, to allow the TaskSet to be implemented on a higher level of abstraction. Do you agree?

However, I think there are some "code smells" that indicate that they are not really separate things in the current implementation (wait times are on both objects, the "client" is present in TaskSet as well as Locust, etc) - basically a lot of things indicating that TaskSet is really just a subtype of Locust.

I just put helpers & state directly in the TaskSet. Hmm... I guess if you put helpers in the Locust class you can inherit them without restricting the type of TaskSet you can inherit from etc. Are there any other differences/benefit of having them in the Locust?

If you have state or helpers that are global to the simulated user, and you use nested TaskSets, you can always access that state, or those helpers, using self.locust.my_state. (I guess this could be partly solved by introducing a root attribute on TaskSet instances).

Overall I'd say that the separation of Locust and TaskSet makes sense when you consider cases with nested TaskSets. Also, if we rename the Locust class to User, it'd feel very weird to nest User classes under other User classes.

Hmm... yes, nested TaskSets make it necessary to introduce some more complexity I guess. Tbh, I never felt the need to do that, because "it is just python" :) If a task needs to perform another task in sequence then I just define a function and then call it. I think this is solving a case that is more relevant for other load gens that dont have the inherent flexibility of running code in your load test.

My tests end up being a little less random than they would if I had used locusts built in task weighting but to me it never mattered.

Anyway, I think nested TaskSets do make sense in some cases, and I'm not against them.

I guess another way of scratching my itch would be to make TaskSets optional somehow

I do remember considering this when introducing TaskSet (previously tasks could be declared directly under Locust and SubLocust). The only benefit from this would be that the minimal locust example would be slightly smaller (for any real world scenario, having to declare one User class on a few lines shouldn't matter). I can see two drawbacks which I think exceeds the benefit:

It feels like a violation of the There should be one-- and preferably only one --obvious way to do it.

It would complicate the implementation a lot. I imagine that we'd construct a TaskSet dynamically under the hood (because the alternative - to bring TaskSet logic into User/Locust - seems like a worse idea) which would make the code more "magic", and the developer would know less about the environment that their code would be running in.

Agreed, that is not a good solution within the current framework.

I think a Locust (in the current implementation) is more like a "thread" or "TaskSet runner". Something that can perform/trigger user tasks, but isnt a user itself. Renaming it "User" wouldnt really fix that.

I don't understand why you don't think Locust/User in the current implementation doesn't represent a user. To me it's exactly what it represents. It is run in a separate thread/greenlet, yes, but that's what I'd expect (how could it represent a simulated user without being run a separate thread?).

What I mean is, a Locust is lots of things (a user agent, wait time definitions, a greenlet, etc) and it contains/is linked to a set of tasks (which is the most important thing for a simulated user). I guess we're kind of getting into semantics here, but a User shouldnt just reference a set of tasks, it primarily is a set of tasks (that is why I like Client better - which also matches what we already use in the command line :) ) But I guess a "fat" Locust (with lots of utility functions) could be described as a user, and a "thin" Locust is more of a client.

I have to go now, but will happily debate more at a later time :)

heyman commented 4 years ago

A client could be dumb and just have an http client (plain HttpLocust) or contain utility functions specific to your site, to allow the TaskSet to be implemented on a higher level of abstraction. Do you agree?

Yes! Though I actually like the User terminology better than Client, since Client can be confused with something like HTTP/websocket/thrift client. I think of User as in "virtual person" :).

I guess one argument for keeping the Locust terminology is that people might have preconceptions of what a User is, and it might feel weird to declare a User when you're simulating just some program speaking to another program over an API. Locust doesn't come with such preconceptions, and it can just be some entity that will be attacking your system. (Though I still think User is a better terminology).

yes, nested TaskSets make it necessary to introduce some more complexity I guess. Tbh, I never felt the need to do that, because "it is just python" :) If a task needs to perform another task in sequence then I just define a function and then call it.

A lot of apps are structured in a kind of hierarchical structure. Nested TaskSets can be very useful to simulate a person using such an app. Imagine a webapp such as GitHub. Then you could have a bunch of TaskSets and tasks that are structured in a way similar to the hierarchy of the app:

Repo
- Issues
- View issue
  - Preview comment
  - Add comment
  - Like comment
  - Follow issue
  - Add label
  - Close issue
- Create issue
- PR
- View PR
  - Add comment
  - Like comment
  - Merge PR
- Create PR
- Browse code
- Blame code
User page
- Follow user
- Repositories
Index page
- ...
Issues
- ...
Explore
- ...

cyberw commented 4 years ago

Ok, I think I kind of follow you now. I would still argue that the TaskSet is a User (because it mainly accesses things on the User/Locust object (like .client) to which it belongs.

But I guess defining the locust class with an in-line definition of the TaskSet seems a viable workaround (maybe we can work that into the documentation examples?)

I have tried a couple of times to put my thoughts into words but end up getting stuck :)

What if we did something radical, like make tasks functions (not object methods) that take a Locust as a parameter? So we hold no state at all on the TaskSet and all state on the Locust/User? That would get rid of the need to copy lots of properties from the Locust to the TaskSet.

cyberw commented 4 years ago

The more I think about it the more I like my idea :) TaskSets would then become basically just lists of weighted (and possibly nested) task methods. The TaskSet object would no longer contain things like wait_time and references to a Locust/User.

cyberw commented 4 years ago

The scheduling logic (execute_next_task() etc) would of course have to be moved to Locust/User.

heyman commented 4 years ago

What if we did something radical, like make tasks functions (not object methods) that take a Locust as a parameter?

That's possible in the current implementation. This works:

def task1(l):
    pass
def task2(l):
    pass

class User(Locust):
    wait_time = constant(1)
    class task_set(TaskSet):
        tasks = [task1, task2]

or this in order to give different wieights:

class User(Locust):
    wait_time = constant(1)
    class task_set(TaskSet):
        tasks = {task1:1, task2:5}

The TaskSet object would no longer contain things like wait_time

I see it as a feature to be able to specify different wait_times for different task sets. The average wait time for some parts of an application could be different from the other parts.

I also think it's useful to be able to store temporary state on the TaskSet instance, related to a certain part of the application.

heyman commented 4 years ago

because it mainly accesses things on the User/Locust object (like .client) to which it belongs.

The TaskSet.client is just a shortcut for TaskSet.locust.client. Though I guess one could argue that having that shortcut might diffuse the individual roles of TaskSet and Locust.

cyberw commented 4 years ago

What if we did something radical, like make tasks functions (not object methods) that take a Locust as a parameter?

That's possible in the current implementation. This works:
def task1(l):
    pass
def task2(l):
    pass

class User(Locust):
    wait_time = constant(1)
    class task_set(TaskSet):
        tasks = [task1, task2]
or this in order to give different wieights:
class User(Locust):
    wait_time = constant(1)
    class task_set(TaskSet):
        tasks = {task1:1, task2:5}
The TaskSet object would no longer contain things like wait_time

I see it as a feature to be able to specify different wait_times for different task sets. The average wait time for some parts of an application could be different from the other parts.

I also think it's useful to be able to store temporary state on the TaskSet instance, related to a certain part of the application.

Sure, you can do it, but I'm not trying to add a new feature, I'm trying to clean up the design and remove stuff that adds complexity internally and gives too many ways to do things for a user.

"There should be one-- and preferably only one --obvious way to do it", right? :)

By no longer instantiating TaskSets we make it so that there is only one place where the test plan stores user state (the Locust/User)

Being able to specify different wait times for different task sets is a niche feature imho, and you can always just use sleeps to achieve that. But if you think that is important, fine, keep it in the TaskSet (exactly like it is today). The main thing is that there should be no instances of TaskSet.

heyman commented 4 years ago

The main thing is that there should be no instances of TaskSet.

Hmm, I don't really see the big issue with having instances of TaskSets (though maybe it's Stockholm syndrome :)).

I think a class works pretty well to encapsulate the code for testing some part of an application, and I can't see a better solution at the moment. If you declare the tasks as module functions, then you have to add an extra reference to each of those functions from your TaskSet/User, which seems redundant. Unless I'm misunderstanding what you mean. Maybe you could give an example in that case?

cyberw commented 4 years ago

They can be static methods, just not object methods. Like this:

class WebTask(TaskSet):
    @task
    def my_task(user):
        user.client.post(...)

class WebUser(User):
    task_set = WebTask
    wait_time = constant_pacing(1)
    ...

I'm assuming decorators work the same way for class functions :)

Oh, and I'm also assuming we've renamed Locust to User..

heyman commented 4 years ago

They can be static methods, just not object methods

Hmm, yes, that could work. Though I suspect that it would complicate the implementation a bit.

I'm assuming decorators work the same way for class functions :)

Yep!

cyberw commented 4 years ago

Pretty much solved by #1304

locustio / locust

Rename and restructure Locust/TaskSet #1264