Open JordiPolo opened 4 years ago
As anecdote of non-portability of thread count, I run a goose session from my laptop to test a container in the cloud and was good. Then run with same parameters from another container in the same network and the system under test died. I believe the more powerful container and faster network caused much more load than the system under test could handle.
Interesting idea.
It would require that statistics be enabled, as only then do individual GooseUser
threads push the necessary information to the parent thread. The parent could therefore measure req/s and start additional GooseUser
threads if the number is too low, and pause GooseUser
threads if the number is too high.
However, it gets quite complicated when you consider load tests with multiple GooseTaskSet
s. Goose starts each GooseTaskSet
in the order they are defined (this was required to ensure consistent load between multiple runs with the same startup options) -- auto-generating load could lead to some GooseTaskSet
's not running at all. But this is also true when running manually, so in itself isn't a regression. I've been intending to add a warning log message when a load test doesn't invoke all task sets, or starts a number of task sets that doesn't match configured weights.
When throttling the load test (ie, if req/s is too high) I imagine it would be done in the reverse order that clients were started -- ie, first throttle the highest thread number, measure for a while, then throttle the next highest thread number, etc.
Also, different task sets could impact req/s more profoundly. For example, if a task set was just loading static assets it would result in drastically more req/s than a task_set loading dynamically generated pages. Auto enabling/disabling a task like this could result in huge fluctuations.
Beyond that, wait_time
could make req/s bursty. For example, if you set a very high wait_time(60,65)
.
I imagine there'd be a few options required: 1) the desired requests/sec, 2) the frequency we tune toward this value, and 3) the % of accuracy required -- this would allow you to tune things to workaround the discrepancies discussed above.
(BTW: I'm going to be offline for a little more than week, so I won't be able to look into implementing this for a while.)
As a first step toward this goal, we'll add a rate limiter, see #90 and #91.
You can basically implement a simple token bucket filter yourself in your task function:
you need a Thread for it (maybe we need to add a way to schedule the main goose task that spawns the Tokio threads as well?):
Writer thread:
check if it’s time to fill the bucket (every second)
get write lock on writer end of bounded queue
Fill bucket with up to 99 tokens
Release write lock
Consume one item from the queue
Block if queue is empty
That should automatically rate limit.
@jeremyandrews those new issues sound like a good next step to me
@JordiPolo #97 has landed, I'd welcome any feedback as to if it proves helpful in solving your immediate problem. You'll need to compile the latest code from Github as there's a bit more work before we're able to release 0.9.0
with this change. (Specifically, task function signatures will change, returning a Result -- hopefully this will come together quickly).
Thanks for your reviews and feedback in that PR!
Thanks so much. I've just updated to master, will test it today.
Only noticeable change is that goose_send
returns a Result which sounds right to me but I think eventually task!
and register_task
would need to also use Result , else as in the example, I need to ignore the error which sounds weird in Rust I guess.
Alternatively, if you do not think it provides value to users of these functions, you could ignore the error inside these functions.
In general works well, thanks so much! A few comments:
I was just doing goose_send().unwrap()
because I thought that posible error would not raise but it did in every one of the tests which I think it will be surprising to people.
With low req/s and lots of tasks will get mostly zeros in the table, also I asked for 5 req/s and it prints as 4, but I suspect I am getting 4.98 or something like that, it is just truncated when printed.
Asking for max of 20 req/s , the tables that are printed while the process is running were giving me 19 req/s as per the above comment, but then the final table with the results was given me 17 req/s . I was running the test for only 1 minute, maybe the time it takes to wait for all threads to finish is enough to make a dent on total req/s. There were only 1300 total request. So this probably is not much that can be done, maybe somehow expedite the cleanup piece.
Running the same test for 4 minutes gets me an average of 18. So yeah, I think it is the time shutting down which has lower number of request and affects the average.
- I was just doing
goose_send().unwrap()
because I thought that posible error would not raise but it does in normal operation, at least it did everytime I've tried, which is surprising.
This is expected. In order to use Tokio we went with a leaky bucket queue implementation. To shut that down, we simply close the throttle channel when the load test finishes: this results in errors (which can safely be ignored, but unwrap won’t work). This means errors ALWAYS happen if you’re using the throttle.
The next step is to allow the ? symbol to unwrap the response in tasks, but that PR is not yet written so for now look at the pattern in the drupal_loadtest.rs example for how to properly handle.
- With low req/s and lots of tasks will get mostly zeros in the table, also I asked for 5 req/s and it prints as 4, but I suspect I am getting 4.98 or something like that, it is just truncated when printed.
Yes, the statistics need improvements to better handle rounding and truncation. You are correct as to what is happening. It’s less problematic with larger values. (#101)
- Asking for max of 20 req/s , the tables that are printed while the process is running were giving me 19 req/s as per the above comment, but then the final table with the results was given me 17 req/s . I was running the test for only 1 minute, maybe the time it takes to wait for all threads to finish is enough to make a dent on total req/s. There were only 1300 total request. So this probably is not much that can be done, maybe somehow expedite the cleanup piece.
A longer test may help, as it averages out.
More users may help (but not necessarily), as when using too few users you may not be able to generate the desired load.
Resetting statistics after all users are started may help (a runtime option), as during startup you’re not achieving maximum load.
Currently the delay is naive — it throttles for the same amount of time after each request, ideally it should subtract the time taken to process a request from the throttle delay to avoid a slow drift. (If you enable the statistics logs you can see this in the timestamps.) (#102)
And finally: it imposes a maximum number of requests only, the above are just some of the ways the actual requests can end up less than this maximum: but less than or equal to the maximum/throttle is our goal so this suggests it’s working as intended.
Only noticeable change is that
goose_send
returns a Result which sounds right to me but I think eventuallytask!
andregister_task
would need to also use Result , else as in the example, I need to ignore the error which sounds weird in Rust I guess.
Correct. Making task functions return a result is the last big thing to complete before we can tag a 0.9.0 release. The interim solution is what you’ll find if you review examples/drupal_loadtest.rs
. (See how we return early: the goal is instead to just use ?
which will simplify that logic.)
Thanks for the explanations. My only concern then is that then seems the plan is that task functions will always return Err when using throttling while not be so when not using throttling.
Do not get me wrong, this does solve my use-case and I do not care ignoring the error but there is nothing I should do with this error, maybe is best I do not know about it.
Yes, understood. Consider it a work in progress. It's going to get easier to simply ignore this (and future) errors soon, and more error cases will get added.
If there is any testing I can do, please let me know
It will be useful to be able to specify load in other ways, not just users. At least in the environment I am in we look at response times always with respect to req/s coming to the system. Something like 99th of my response time is no larger than Xms given I get xx req/s or less.
Tweaking the users parameter can approximate the desired req/second but it is manual tuning and not portable between machines. It would be fantastic if I could pass a param to Goose and it tries to more or less keep around the number given.
I am not sure how to deal with big numbers, if an user asks for creating 100,000 req/s does Goose start gazillions of threads trying to accomodate? So maybe this feature is limited to lowish numbers, 50req/s or something like that, at least on a first iteration.