grafana / xk6-output-prometheus-remote

k6 extension to output real-time test metrics using Prometheus Remote Write.
GNU Affero General Public License v3.0
159 stars 72 forks source link

K6 prometheus metric `k6_http_req_failed_rate` is broken #159

Closed MarkSRobinson closed 1 year ago

MarkSRobinson commented 1 year ago

Brief summary

When using the prometheus remote write extension for k6, the k6_http_req_failed_rate is useless. It doesn't increase or decrease, it jumps to 1 on the first error and stays there.

k6 version

0.47

OS

docker image

Docker version and image (if applicable)

grafana/k6:0.47.0

Steps to reproduce the problem

Config:

                    containers:
                      - name: k6-container
                        image: grafana/k6:0.47.0
                        command: ["/bin/sh", "-c"]
                        args:
                          - "k6 run /scripts/k6.js -o experimental-prometheus-rw"
                        env:
                        - name: K6_PROMETHEUS_RW_SERVER_URL
                          value: "http://metrics-system-prometheus.monitoring.svc.cluster.local:9090/api/v1/write"
                        - name: K6_PROMETHEUS_RW_TREND_STATS
                          value: "p(95),p(99),min,max,avg"

K6 script:

              k6.js: |-
                import http from 'k6/http';
                import { check } from 'k6';
                export const options = {
                stages: [
                  { target: 200, duration: '460s' },
                  { target: 0, duration: '30s' },
                ],
                };
                  export default function () {
                  const result = http.get('http://emoji-svc-1-1.tar:8801/metrics');
                  check(result, {
                'http response status code is 200': result.status === 200,
                });
                }

k6 output

     checks.........................: 99.99%  ✓ 9066640      ✗ 26     
     data_received..................: 58 GB   119 MB/s
     data_sent......................: 861 MB  1.8 MB/s
     http_req_blocked...............: avg=3.45µs   min=0s       med=1.58µs   max=64.74ms  p(90)=2.1µs    p(95)=2.59µs 
     http_req_connecting............: avg=55ns     min=0s       med=0s       max=38.77ms  p(90)=0s       p(95)=0s     
     http_req_duration..............: avg=5.26ms   min=0s       med=4.25ms   max=131.96ms p(90)=9.86ms   p(95)=12.33ms
       { expected_response:true }...: avg=5.26ms   min=676.31µs med=4.25ms   max=131.96ms p(90)=9.86ms   p(95)=12.33ms
     http_req_failed................: 0.00%   ✓ 26           ✗ 9066640
     http_req_receiving.............: avg=401.67µs min=0s       med=220.74µs max=108.02ms p(90)=822.31µs p(95)=1.28ms 
     http_req_sending...............: avg=15.41µs  min=0s       med=7.45µs   max=72.86ms  p(90)=9.82µs   p(95)=16.42µs
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s     
     http_req_waiting...............: avg=4.84ms   min=0s       med=3.85ms   max=121.12ms p(90)=9.25ms   p(95)=11.64ms
     http_reqs......................: 9066666 18503.357772/s
     iteration_duration.............: avg=5.38ms   min=318.58µs med=4.36ms   max=132.03ms p(90)=10.02ms  p(95)=12.53ms
     iterations.....................: 9066666 18503.357772/s
     vus............................: 1       min=1          max=199  
     vus_max........................: 200     min=200        max=200  

During the active phase, I deleted 30% of the pods for the target service. This caused request errors as expected but is not reported in the metrics.

Expected behaviour

image

Because this is a rate metric, it should show a brief spike and then fall back to 0. An alternative fix, would be to have a k6_http_req_failed_total which prometheus can then turn it into a rate.

Actual behaviour

image

In this image, the metric rate jumped to 1 and stayed there. This isn't correct as it should have dropped back to zero after the system adjusted.

mstoykov commented 1 year ago

Hi @MarkSRobinson please see https://github.com/grafana/xk6-output-prometheus-remote/issues/77 where this has already been discussed.