[seeking comments] Dynamically controlled performance test

markjmeier commented 2 years ago

Publicly posting this in case others can share interest or use cases around it.

The general idea would be to allow dynamic control of the number of VUs executing in a particular test while the test is running. Currently once the VUs are set there is no way to increase or decrease them without aborting the test and restarting it with new options. For the vast majority of cases setting VUs/Duration/Ramping patterns before the test starts is perfectly acceptable. In my view the best practice would likely remain to set your ramping patterns and run your test so they remain a constant if you need to further iterate your test.

The use case I've come across most commonly tends to be users who must test a production system and have a relatively short test window in which they are able to test the system. For these users if production was misbehaving they would want to scale back a test to let it recover while still collecting test data. Or - if things were healthy they could desire the ability to push on it a little bit harder

With that said - we have interest in hearing comments about this, good/bad/indifferent.

na-- commented 2 years ago

The use case I've come across most commonly tends to be users who must test a production system and have a relatively short test window in which they are able to test the system. For these users if production was misbehaving they would want to scale back a test to let it recover while still collecting test data. Or - if things were healthy they could desire the ability to push on it a little bit harder

If this is the most common use case, then it might be better addressed by something else. For example, a new executor or an enhancement of the arrival-rate executors that automatically scales the iteration rate up or down, depending on some rule or parameter :thinking: Say, something like "make up to X iters/s unless response times become over Y", or something like it that dynamically adjust load based on observations from the SUT. I remember reading some article like that a long time ago, but not exactly where it was.

Manually scaling tests up and down with a button in our webapp is not great. It will have a higher latency, for sure, even if we don't have any delays in processing cloud metrics. By the time a user scales down or stops their load tests, they might have already brought down production... Or, by Murphy's law, their internet will cut out exactly when they were supposed to stop the test :sweat_smile:

Moreover, manual control doesn't allow us to have automated alerts or analysis on the data. With automated rate adjustment, users can confidently schedule tests, knowing that they won't bring down their system. For example, they can schedule a test daily that hammers their web service at 100 RPS, but with an upper limit of http_req_duration <= 500ms. And if a particular run doesn't reach 100 RPS, we'd be able to alert them something like this:

Test run so-and-so exceeded the allowed limit of http_req_duration<=500ms at t=60s and iteration_rate=57 (out of a 100 iters/s goal)

Finally, a benefit of something like this would be that it will probably work for both k6 OSS and k6 cloud. A drawback will be that it might be a bit more complicated to implement, especially in distributed execution... :thinking: WDYT?

markjmeier commented 2 years ago

I like what you propose as it removes the human element from it. A machine can react quicker and more consistently. I do wonder (and where I want comments from people interested) if this could actually be a binary choice or combination of choices. Are there situations where the decision to ramp down may not be as simple as response time is greater than some set value and it may need to be something like:

http_req_duration(p99) <= 500 during the last 5 trailing seconds and newOrders are not decreasing minute of minute

So someone might say, I'm okay to let response times stay higher as long as the number of new e-commerce orders are not decreasing.

On the other hand, I think increasing tests are much more likely to be binary. If http_req_duration <= 50ms then increase VUs by 25% ramping over X minutes and then holding for at least Y minutes. Almost like injecting new stages into the configuration in real time: (no idea if feasible, but to just express how it theoretically could work):

newVUs = currentVUs x 1.25

{ duration: 'Xm', target: newVUs },
{ duration: 'Ym', target: newVUs }

na-- commented 2 years ago

no idea if feasible, but to just express how it theoretically could work

If you express things in terms of VUs, then it's probably not easily feasible. Even if it was, it's probably not a good idea to allocate new VUs mid-test, since that can skew results. Besides, executors with looping VUs are not great when you want to simulate load precisely because they already suffer from the coordinated omission problem - when the system-under-test slows down, the request rate will also slow down. So, I'd suggest we focus on arrival-rate executors here, and adjust the rate dynamically, not the amount of VUs.

markjmeier commented 9 months ago

Closing this as there has been no/limited interest/need for this.

grafana / k6-cloud-feature-requests

[seeking comments] Dynamically controlled performance test #59