grafana / xk6-output-prometheus-remote

k6 extension to output real-time test metrics using Prometheus Remote Write.
GNU Affero General Public License v3.0
159 stars 72 forks source link

Strange behavior of the dashboard. #103

Closed zzhao2010 closed 1 year ago

zzhao2010 commented 1 year ago

Firstly thanks for sharing these great dashboards for visualization. They look awesome. On the other hand, I saw strange behavior while I was testing the dashboards with my test cases, and I do have question about the data accuracy as the data reporting on the dashboards doesn't look to align with the test result on the command line.

Let's have the 1st metric "Request Made" on the "Test Result" dashboard as an example. There were 2 values reporting, which is quite confusing. And neither of these values reflected the accurate number of requests being generated over the test case. image And if you take a look at the P95 Response Time metric on the dashboard, it was 3x faster than the p95 response time reported in the test summary on command line side. image

jwcastillo commented 1 year ago

Have you tried the latest version of the dashboard?

lagunkov commented 1 year ago

It is better with update, but there is still issue:

With new dashboard:

image

but when I decrease time range, I've got other total reqs count

image

Total requests count changes depends on range, but never equals the actual http requests count:

image

Also p95 response time looks different.

jwcastillo commented 1 year ago

@lagunkov Is it a test that you can share for me to replicate the incident? I'm in the k6 slack like @Wen

jwcastillo commented 1 year ago

@zzhao2010 Do you still have the same problem?

lagunkov commented 1 year ago

Sorry I can not share test that produced screen above because it contains private info.

I tried to make reduced test case with example from https://test.k6.io/ with such options

14 export let options = {
 15   scenarios: {
 16     sample: {
 17       executor: 'ramping-vus',
 18       startVUs: 1,
 19       stages: [
 20           { target: 20, duration: "1m" },
 21           { target: 20, duration: "3m" },
 22           { target: 0, duration: "1m" }
 23       ],
 24     }
 25   },
 26   tags: {
 27     testid: 'test grafana 0.1'
 28   }
 29 };

It has the same total request and p95 response time count when time range is "last 3 hours" here:

image

And when I choose shorter time range it transforms into:

image

Hope this will help.

jwcastillo commented 1 year ago

🆗 , I let me check

zzhao2010 commented 1 year ago

@jwcastillo Looks like the issue was fixed with the latest version. @lagunkov btw, the issue you described above happens on my end as well.. Looks like the values would be messed up if we changed the timeframe. I always use the link in the test list dashboard to the test result dashboard. That way the data reporting would be accurate.

soolch commented 1 year ago

Hi, I have this issue also. I found that if the test duration is short, the Request Made metrics is correct, but when test duration go longer, the Request Made metrics will be less than the exact request made count. I did a comparison with the k6 Cloud. Here you can see the Request Made, Peak RPS, and the P95 Response Time also have different value. image image

soolch commented 1 year ago

Hi @zzhao2010, may I know how do you solve your issue? Coz I am also using v0.2.0 but still the same.

codebien commented 1 year ago

@soolch are you sure you're using the latest version? Did you pull the latest commit from the main branch or from the latest tag? If yes, then can you post an anonymized script that allows us to reproduce your issue, please? You have an example a few lines before in this comment using test.k6.io.

soolch commented 1 year ago

Hi @codebien, I have tried it once again, with the latest k6 binary, following the k6 documentation as it has updated that this is the official dashboard. But the issue still happens. I tried using the following option

export const options = {
  scenarios: {
    'scenario-vehicle-content': {
      executor: 'ramping-arrival-rate',
      startRate: 50,
      timeUnit: '1m',
      preAllocatedVUs: 2,
      maxVUs: 50,
      stages: [
        { target: 50, duration: '1m' },
        { target: 100, duration: '1m' },
        { target: 100, duration: '1m' },
        { target: 200, duration: '1m' },
        { target: 200, duration: '1m' },
        { target: 300, duration: '1m' },
        { target: 300, duration: '1m' },
        { target: 400, duration: '1m' },
        { target: 400, duration: '1m' },
      ],
    },
  },
};

image image

But if i reduce my total test duration to 5m then the result shows correctly. For the following stage configuration, the result is correct.

stages: [
        { target: 50, duration: '1m' },
        { target: 100, duration: '1m' },
        { target: 100, duration: '1m' },
        { target: 200, duration: '1m' },
        { target: 200, duration: '1m' }
      ],

image image

codebien commented 1 year ago

@jwcastillo can you take a look into it, please?

jwcastillo commented 1 year ago

yes, I take this

soolch commented 1 year ago

Hi @jwcastillo, may I know are you able to simulate the same result at your side.

soolch commented 1 year ago

Hi @jwcastillo, would it be because of this https://k6.io/docs/results-output/real-time/prometheus-remote-write/#stale-trend-metrics

codebien commented 1 year ago

Hi @soolch, do you use the dashboard with the Stale marker option enabled?

soolch commented 1 year ago

Hi @codebien, I didn't. Just that i read this stale option which also say 5mins. And the when i try search the result in the promtheus, those that are more than 5mins will be disappeared, which cause the grafana result incorrect.