Less goroutines starts

As suggested in this comment it might be better if instead of constantly starting goroutines for each iteration, instead to have each VU in a "permanent" goroutine and send it iteration to start. This will probably be more performant but has the downside that unlike the current approach doesn't guarantee that the least used VU will do the next iteration ... which might be something we care about a lot.

I propose we try it out and benchmark it .. if it is really a lot faster and has some amount of balancing - I am for using it.

Bucketing iterations

While discussing how accurate the proposed start of iterations we came to the idea that having a way to be ... "less" accurate is a good idea. A lot of times people have asked if they can start a number of iterations at the same time and it is also the way that probably some people will understand the arrival rate executors: you start 10 iterations at the beginning of each second. instead of you start 10 iterations over 1 second. The proposal is that we add yet another argument which is in what buckets we should start iterations (I guess we can also use it for variable looping VUs to start VUs in buckets :) ).

So :

execution:{
       "var1": {
            type: "variable-arrival-rate",
            startRate: 50,
            timeUnit: "1s",
            timeBucket: "100ms",
            preAllocatedVUs: 20,
            stages: [
                { target: 50, duration: "5s"},
                { target: 100, duration: "5s", timeBucket "500ms"},
                { target: 100, duration: "5s" },
            ],
        },
}

Would mean that for the first 5 seconds instead of starting one iteration each 1s/50 = 20ms. We will start 5 iterations each 100ms (the time bucket). Then instead of ramping up (linearly from 50 to 100), which usually would've meant that will be starting iterations every 20ms-10ms depending on how close to the end of the ramp-up we are we will start all the iterations for each 500ms at the same time every 500ms. Finally instead of doing 1 iteration every 10ms(1s/100) we will start10 iterations every 100ms (the timeBucket).

I wonder if this will not be easier with defining a number of iterations instead of time ... maybe both could be supported.

Maybe we should add another executor for this instead, as this will probably worsen the usual performance? Although it is important to note that the original proposal was to do this by default for 1ms so we don't need to sleep so much .. and possibly be nearly as accurate ... so all of those need to be tested :D

na-- commented 4 years ago

Not sure I like timeBucket as a name, but cadence is the only other think I can think of, and it also doesn't seem very good :disappointed:

I wonder if this will not be easier with defining a number of iterations instead of time ... maybe both could be supported.

I don't think we need this right now, since timeBucket/cadence/whatever would likely be sufficient and better. And if it's not, users can easily wrap the iteration in a for loop themselves, even now. So, I think we shouldn't do it, for now. If it turns out it is needed, after all, we can easily add it as an option at any later point in time, without any breaking changes, since its default value will just be 1.

na-- commented 4 years ago

As mentioned in https://github.com/loadimpact/k6/pull/1285#discussion_r407352441, the initialization of unplanned VUs probably shouldn't happen in the for { select {} } loop that starts iterations, since it can significantly derail the iteration pace, for both arrival-rate executors.

On the other hand, we probably also don't want to initialize more than 1 unplanned VU at a time. So this feels like it should be a small "side-car" helper to these executors. And it will probably be easier to do with the "Less goroutines starts" implementation from https://github.com/loadimpact/k6/issues/1386#issue-593465462

Edit: somewhat connected issue to this: https://github.com/loadimpact/k6/issues/1407

na-- commented 4 years ago

I started refactoring both arrival-rate executors to use a common execution code, but quickly hit various issues and decided to just copy-paste the fixes for https://github.com/loadimpact/k6/issues/1386#issuecomment-612785180 (i.e. the newer commits in https://github.com/loadimpact/k6/pull/1500) for now... :disappointed:

We should still do it before we implement the features proposed above, but yeah, decided to leave it for 0.28.0 as well...

Another thing we should do when refactoring this to have a common run framework is to mock time (https://github.com/loadimpact/k6/issues/1357#issuecomment-633613930). This would allow us to test everything in a non-flaky way, finally... And given that we'd need to plug different timers from the two executors, it should be pretty natural to do it in a way that's testable, :crossed_fingers:

mstoykov commented 4 years ago

https://github.com/loadimpact/k6/commit/8c2504532ecafa1986e64826906907eab7783fa3 here are my tries at refactoring the ramping arrival rate into something more ... split up, with the idea to use it in the constant arrival rate as well

mstoykov commented 3 years ago

Sparked by this comment in the community forum I did some profiling with the following scripts:

exports.options = {
    scenarios: {
        test1: {
            executor: 'ramping-arrival-rate',
            preAllocatedVUs: 100,
            stages:[
                {target:100000, duration:'0'},
                {target:100000, duration:'6000s'},
            ],

        },
    },
};

exports.default =  function () {}

and

exports.options = {
    scenarios: {
        test1: {
            executor: 'constant-arrival-rate',
            preAllocatedVUs: 1000,
            rate: 100000,
            duration: "6000s"
        },
    },
};

exports.default =  function () {}

Both doing empty iterations with arrival-rate executors Both hit a limit of around 8-9k (sometimes lower). Running them with 100 instead of 1000 VUs makes this a lot better by them now going as high as 70k. Regardless of that all of them spend an obscene amount of time creating stacks which is likely due to the fact we create a new goroutine per each iteration for arrival-rate executors.

There is still more investigation needed to figure out why having 100 VUs is so much better than having 1k, but arrival-rate definitely need at least the "Less goroutines starts" part of this issue as a simple -u 100 -d 10m script can do close to 450k (again having 1000 VUs is worse, but in this case by like 10k in the two tests I did)

na-- commented 3 years ago

I created a new issue for tracking the performance optimization: https://github.com/k6io/k6/issues/1944

So this issue will just be left for discussing the time-bucketing and other improvements like refactoring the long and complicated Run() method and/or refactoring it even further and re-using the ramping implementation for the constant-arrival-rate executor as well.

grafana / k6

Enhancement to arrival rate executors #1386

Less goroutines starts

Bucketing iterations