grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
26.17k stars 1.27k forks source link

Requests per second metric per step in a user flow #2040

Open Bessonov opened 3 years ago

Bessonov commented 3 years ago

Feature Description

I like the output from ab and I have a hard time to understand the metrics from k6. Especially I miss the rps metric, if I want to test a user flow. For example, my user flow consists of 5 steps

  1. open register form
  2. register user
  3. get email with verification link (external service, maildev to be precise)
  4. open confirmation page
  5. delete confirmation mail (aka clean up)

I managed to get more submetrics:

     data_received....................................: 11 MB  1.1 MB/s
     data_sent........................................: 741 kB 73 kB/s
     http_req_blocked.................................: min=0.00ms   avg=0.07ms   p(99)=0.80ms    p(99.9)=14.54ms   max=42.95ms   count=1800
     http_req_connecting..............................: min=0.00ms   avg=0.05ms   p(99)=0.59ms    p(99.9)=4.81ms    max=42.73ms   count=1800
     http_req_duration................................: min=0.55ms   avg=164.64ms p(99)=782.34ms  p(99.9)=923.64ms  max=945.70ms  count=1800
       { expected_response:true }.....................: min=0.55ms   avg=164.64ms p(99)=782.34ms  p(99.9)=923.64ms  max=945.70ms  count=1800
     ✓ { name:1. open register form }.................: min=1.93ms   avg=7.74ms   p(99)=63.97ms   p(99.9)=65.78ms   max=65.97ms   count=300 
     ✓ { name:2. register user }......................: min=108.21ms avg=420.22ms p(99)=857.30ms  p(99.9)=934.59ms  max=945.70ms  count=600 
     ✓ { name:3. get email with verification link }...: min=0.68ms   avg=2.07ms   p(99)=7.01ms    p(99.9)=11.25ms   max=11.52ms   count=300 
     ✓ { name:4. open confirmation page }.............: min=5.28ms   avg=136.00ms p(99)=385.23ms  p(99.9)=461.81ms  max=483.54ms  count=300 
     ✓ { name:5. delete confirmation mail }...........: min=0.55ms   avg=1.61ms   p(99)=12.79ms   p(99.9)=20.00ms   max=20.53ms   count=300 
     http_req_failed..................................: 0.00%  ✓ 0    ✗ 1800
       { name:1. open register form }.................: 0.00%  ✓ 0    ✗ 300 
       { name:2. register user }......................: 0.00%  ✓ 0    ✗ 600 
       { name:3. get email with verification link }...: 0.00%  ✓ 0    ✗ 300 
       { name:4. open confirmation page }.............: 0.00%  ✓ 0    ✗ 300 
       { name:5. delete confirmation mail }...........: 0.00%  ✓ 0    ✗ 300 
     http_req_receiving...............................: min=0.04ms   avg=0.39ms   p(99)=10.76ms   p(99.9)=33.66ms   max=47.22ms   count=1800
     http_req_sending.................................: min=0.01ms   avg=0.05ms   p(99)=0.25ms    p(99.9)=7.38ms    max=11.47ms   count=1800
     http_req_tls_handshaking.........................: min=0.00ms   avg=0.00ms   p(99)=0.00ms    p(99.9)=0.00ms    max=0.00ms    count=1800
     http_req_waiting.................................: min=0.48ms   avg=164.21ms p(99)=782.12ms  p(99.9)=923.43ms  max=945.47ms  count=1800
     http_reqs........................................: 1800   176.622083/s
     iteration_duration...............................: min=342.50ms avg=991.35ms p(99)=1488.04ms p(99.9)=1562.11ms max=1569.80ms count=300 
     iterations.......................................: 300    29.437014/s
     vus..............................................: 10     min=10 max=30
     vus_max..........................................: 30     min=30 max=30

which shows latencies for every step. Great! If I look at "iterations [...] 29.437014/s", then my assumption is, that the system can handle 29.4 registrations per second, also meaning all five steps. Probably, it is the rps metric if there were only one step, right? If I use the same technique for iterations or http_reqs then it doesn't print it or print not meaningful data:

     http_reqs........................................: 1800   172.355533/s
       { name:1. open register form }.................: 300    28.725922/s
       { name:2. register user }......................: 600    57.451844/s
       { name:3. get email with verification link }...: 300    28.725922/s
       { name:4. open confirmation page }.............: 300    28.725922/s
       { name:5. delete confirmation mail }...........: 300    28.725922/s
[...]
     iterations.......................................: 300    28.725922/s

Latencies are great and give very useful insights. But from my point of view they are not enough to understand the capabilities of a system. For example, latency of 200ms is great for 5000rps, but bad for 5rps.

Suggested Solution (optional)

Probably it would be nice to have same behavior for http_reqs (or iterations?) like for http_req_duration:

     http_reqs........................................: min=50/s   avg=30/s p(99)=10/s  p(99.9)=5/s  max=1/s  count=1800
[...]
     ✓ { name:2. register user }......................: min=10/s avg=7/s p(99)=5/s  p(99.9)=3/s  max=1/s  count=600 
[...]

I looked at custom metrics, but didn't found a way to achieve it with them.

na-- commented 3 years ago

You could already do most of this with a Counter, either a built in one (like http_reqs) or a custom one. For example:

import http from 'k6/http'

export let options = {
    scenarios: {
        test_counters: {
            executor: 'constant-arrival-rate',
            rate: 5,
            duration: '30s',
            preAllocatedVUs: 150,
        },
    },
    thresholds: {
        // Hack to surface these sub-metrics (https://github.com/k6io/docs/issues/205)
        'http_req_duration{my_tag:test1}': ['max>=0'],
        'http_req_duration{my_tag:test2}': ['max>=0'],
        'http_req_duration{my_tag:test3}': ['max>=0'],
        'http_reqs{my_tag:test1}': ['count>=0'],
        'http_reqs{my_tag:test2}': ['count>=0'],
        'http_reqs{my_tag:test3}': ['count>=0'],
    },
}

export default function () {
    http.get('https://httpbin.test.k6.io/delay/1', { tags: { my_tag: 'test1' } });
    http.get('https://httpbin.test.k6.io/delay/1', { tags: { my_tag: 'test1' } });
    http.get('https://httpbin.test.k6.io/delay/2', { tags: { my_tag: 'test2' } });
    http.get('https://httpbin.test.k6.io/delay/1', { tags: { my_tag: 'test3' } });
}

will produce something like this:

     http_req_duration..............: avg=1.38s    min=1.12s   med=1.13s    max=2.14s    p(90)=2.12s    p(95)=2.13s   
       { expected_response:true }...: avg=1.38s    min=1.12s   med=1.13s    max=2.14s    p(90)=2.12s    p(95)=2.13s   
     ✓ { my_tag:test1 }.............: avg=1.13s    min=1.12s   med=1.12s    max=1.54s    p(90)=1.13s    p(95)=1.13s   
     ✓ { my_tag:test2 }.............: avg=2.12s    min=2.12s   med=2.12s    max=2.14s    p(90)=2.13s    p(95)=2.13s   
     ✓ { my_tag:test3 }.............: avg=1.13s    min=1.12s   med=1.12s    max=1.37s    p(90)=1.13s    p(95)=1.15s   
...
     http_reqs......................: 600    16.768966/s
     ✓ { my_tag:test1 }.............: 300    8.384483/s
     ✓ { my_tag:test2 }.............: 150    4.192241/s
     ✓ { my_tag:test3 }.............: 150    4.192241/s
     iteration_duration.............: avg=5.92s    min=5.84s   med=5.9s     max=7.11s    p(90)=5.97s    p(95)=6.04s   
     iterations.....................: 150    4.192241/s

But notice that test3 has the exact same rate as test2, despite it taking just a single second instead of two. This is because of the synchronous nature of the HTTP requests in a single iteration, the test2 is blocking test3. And that is when we're using an arrival-rate executor, with some looping executor things will be even worse...

So I'm not sure if this will give you a lot of valuable information :man_shrugging: And I have no idea how you'd even calculate something like p(99.9)=3/s, as you suggest - I'm not sure how we can calculate the percentile of a counter?

Bessonov commented 3 years ago

@na-- thank you very much for your quick response and explanation.

But notice that test3 has the exact same rate as test2, despite it taking just a single second instead of two.

Yeah, that is what I meant with "not meaningful data".

This is because of the synchronous nature of the HTTP requests in a single iteration, the test2 is blocking test3.

That means, that http_reqs is tied to the whole iteration and not to a sum of single requests. A little bit surprising...

And that is when we're using an arrival-rate executor, with some looping executor things will be even worse.

... I'm not sure I got it. In fact, I think, I don't understand how k6 works at all :open_mouth: But well, I'll try. Maybe I'm a little bit naive, but with decoupling the start of new VU iterations from the iteration duration it shouldn't work in this way. I mean even if test2 blocks test3, the processing rate can be different between them. For example, if "1. open register form" can process 20rps on average and "2. register user" only 7rps, then it should be visible if there are enough VUs. But yeah, I see that in a user flow it makes less sense. Probably it's only accurate, if every http.* is handled like a barrier and only if all VUs pass the barrier, all VUs go to the next barrier. That is the same, if ab used 5 times in my example or 4 times in your example.

I'm not sure how we can calculate the percentile of a counter?

In my view, rps isn't exactly a counter, but a 1 second window. Maybe again very naive, but I can think of two models. For every http.*:

In both models I see a bunch of numbers, which represents rps on different points of time. For example, If I got 10 rps numbers like 1, 2, 3, 4, 4, 4, 4, 5, 6, 7 from the first model, then p(90)=2/s, which can mean that 90% of time the system was able to handle at least 2 rps. And avg = 4/s, max=1/s, min=7/s.

What do you think about it? Does it make sense at all or I'm on wrong path? Maybe I'm damaged by ab :D

na-- commented 3 years ago

I'll start backwards :sweat_smile:

In my view, rps isn't exactly a counter, but a 1 second window.

This is the crux. k6 itself doesn't operate in 1-second time windows. The "time window" used in the end-of-test summary when k6 calculates the requests per second is... the whole test run duration :sweat_smile: So, when you see <count>/s in it the summary, those are calculated as <total (e.g. request) count for the whole test run> / <duration of the whole test run>.

When users need something else (e.g. bucketing request counts in 1-second time windows or charting things over time), k6 allows you to export the raw metrics it measured to an external output like InfluxDB or a JSON/CSV file. You'd get the timestamped measurements for every HTTP request, iteration, custom metric, etc. in your test and you can process them however you like.

I've briefly touched on time windows for metrics internally in k6 when it comes to setting thresholds in https://github.com/k6io/k6/issues/1136, but I can see how it also makes some sense to support them for the end-of-test summary. Mind you, there would still be some gotchas if you want to measure requests per second in 1s intervals, but I can see why it might be useful in some cases.

That means, that http_reqs is tied to the whole iteration and not to a sum of single requests. A little bit surprising...

Probably it's only accurate, if every http.* is handled like a barrier and only if all VUs pass the barrier, all VUs go to the next barrier. That is the same, if ab used 5 times in my example or 4 times in your example.

Neither of these statements is true :slightly_smiling_face: Let me backtrack a little bit... The 3 important concepts in k6 you need to know are VUs (virtual users), iterations and executors:

Additionally, because k6 doesn't have per-VU event loops yet (https://github.com/k6io/k6/issues/882), most of the JS code in an iteration is executed synchronously. For example, if you have:

export default function () {
  http.get("https://test-api.k6.io/");
  http.get("https://test.k6.io/");
  sleep(0.1);
};

Every VU executing this iteration will first make a request to https://test-api.k6.io/, fully wait for it to finish and read its response, and make a request to https://test.k6.io/ after that, wait for that to finish, sleep for 100ms, and only then finish the iteration. All of the VUs will run this code in parallel to each other, according to the executor that is configured, but within a single iteration running in a single VU, things happen mostly synchronously.

The only exceptions to that currently are the http.batch() function (that allows a single VU to make multiple concurrent requests) and websockets support in k6/ws (which has its own local event loop).

So, if you have a long HTTP request followed by a short HTTP request, the RPS of the second can't really be higher than the rate per second of the first. It is basically coordinated omission and the only way to avoid it is to have the requests be executed completely independently from each other, with an arrival-rate executor and in different scenarios. Even http.batch() wouldn't be enough - while the two requests will be executed independently inside of the http.batch() call, the call itself wouldn't finish (and k6 wouldn't continue with the execution) until all of the requests in the http.batch() call are finished...

Bessonov commented 3 years ago

Every VU executing this iteration will first make a request to https://test-api.k6.io/, fully wait for it to finish and read its response, and make a request to https://test.k6.io/ after that, wait for that to finish, sleep for 100ms, and only then finish the iteration.

That matches exactly my expectations!

k6 itself doesn't operate in 1-second time windows. The 3 important concepts in k6 you need to know are VUs (virtual users), iterations and executors:

Thanks, great, then I understood most of the things right.

So, if you have a long HTTP request followed by a short HTTP request, the RPS of the second can't really be higher than the rate per second of the first.

I see the point. The second request is "bottle-necked" by the first request. Let us ignore k6 internals for a moment and make some assumptions:

The use case I see is to have a user flow (= a bunch of synchronous requests). While the requests fired, I want to measure not only latencies, but also rps of every involved endpoint. In fact, I can do that with k6 already now! For example, to do that I:

  1. Create a file with user credentials (email, password)
  2. Create and run the first test file: login.js. Every VU reads this file, submit login form, get token/cookie and store it in session file. At this point I get rps for login form.
  3. Create and run the second test file: update-profile.js. Every VU reads the session file and set user's name. Now I get rps for update profile endpoint.

If we assume, that we can synchronize the above steps among VUs (= "barrier"), then we can get it in one test file. Of course there are some other bits like "does iteration belong to the steps or to whole script" and gotchas.

Now all assumptions off. It seems like this is not possible in k6 yet, because all VUs/Workers are independent. Do you think it makes sense for k6 to support this flow? If not, well, feel free to close this FR.

Thank you, @na-- !

na-- commented 3 years ago

As you've seen, k6 doesn't have file writing capabilities outside of handleSummary(), so you'd have to use that xk6 extension to save the file between scripts, but otherwise your plan seems fine and should get you the results you want. With the caveat that the req/s will be calculated based on the whole script duration, so if you have some stragglers, they might skew the results heavily (which might or might not be what you want :man_shrugging:)

Regarding having everything in a single file, that seems somewhat possible to do currently with multiple scenarios. Scenarios don't need to be concurrent, they can be sequential with the startTime property, and VUs are reused across scenarios when they don't overlap, so you can do something like this:

import http from 'k6/http'

// TODO: actually use something like this:
// consts users = JSON.parse(open('users.json'));
// and maybe a SharedArray, if the file is too big
const users = [
    { username: 'test1', password: 'pass1' },
    { username: 'test2', password: 'pass2' },
    { username: 'test3', password: 'pass3' },
    { username: 'test4', password: 'pass4' },
];

export let options = {
    scenarios: {
        login: {
            executor: 'per-vu-iterations',
            vus: users.length, // have exactly as many VUs as users
            iterations: 1,
            maxDuration: '15s',
            gracefulStop: '0s',
            exec: 'login',
        },
        update_profile: {
            executor: 'per-vu-iterations',
            vus: users.length, // have exactly as many VUs as users
            iterations: 1,
            startTime: '15s', // at least the (maxDuration+gracefulStop) of login
            maxDuration: '15s',
            gracefulStop: '0s',
            exec: 'updateProfile',
        },
    },
    thresholds: {
        // Hack to surface these sub-metrics (https://github.com/k6io/docs/issues/205)
        'http_req_duration{scenario:login}': ['max>=0'],
        'http_req_duration{scenario:update_profile}': ['max>=0'],
        'http_reqs{scenario:login}': ['count>=0'],
        'http_reqs{scenario:update_profile}': ['count>=0'],
    },
}

export function login() {
    let credentials = users[__VU - 1];
    console.log(`User ${__VU} logging in with ${JSON.stringify(credentials)}...`);

    // TODO: actually implement login logic, this is only for demonstration
    let randomDelay = Math.random().toFixed(2);
    http.get(`https://httpbin.test.k6.io/delay/${randomDelay}?username=${credentials.username}`);
    let bogusToken = `vu${__VU}_${randomDelay}`;

    credentials.token = bogusToken;
}

export function updateProfile() {
    let credentials = users[__VU - 1];
    console.log(`User ${__VU} updating their profile ${JSON.stringify(credentials)}...`);

    // TODO: actually implement update profile logic, this is only for demonstration
    let randomDelay = (3 * Math.random()).toFixed(2); // longer requests
    http.get(`https://httpbin.test.k6.io/delay/${randomDelay}?token=${credentials.token}`);
}

This has some other rough edges though: