grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
25.16k stars 1.25k forks source link

K6_PROMETHEUS_RW_STALE_MARKERS doesn't seem to be working properly #3312

Closed gabrielgiussi closed 1 year ago

gabrielgiussi commented 1 year ago

Brief summary

I set K6_PROMETHEUS_RW_STALE_MARKERS to true but the metrics appear in prometheus for 5 min after the end of the test, this can be seen in the following image, the read line corresponds to the 99 percentile from a test that ends at ~14:17, the metric appears until ~14:22.

image

It is something that must be supported by prometheus as well? Can I turn on some logging that could indicate any problem with this flag?

k6 version

k6 v0.46.0 (2023-08-14T13:23:26+0000/v0.46.0-0-gcbd9e9ad, go1.20.7, linux/amd64)

OS

linux/amd64

Docker version and image (if applicable)

No response

Steps to reproduce the problem

I'm running a test with the following flags

export K6_PROMETHEUS_RW_SERVER_URL=http://localhost:9270/api/v1/write
export K6_PROMETHEUS_RW_TREND_STATS="p(50),p(90),p(99),avg"
export K6_PROMETHEUS_RW_STALE_MARKERS=true

exec k6 run -o experimental-prometheus-rw scenario.js

Expected behaviour

Metrics keep appearing in prometheus even using K6_PROMETHEUS_RW_STALE_MARKERS

Actual behaviour

Metrics should stop appearing in prometheus when the test ends

mstoykov commented 1 year ago

Hi I just tested this with the docker-compose setup in https://github.com/grafana/xk6-output-prometheus-remote that uses Prometheus directly as storage and it works.

I was a bit confused as one of the graphs creates a irate and the rate is over the last 1m, so it does have "tail" for 1m after the test finishes. But all the graphs that are just showing a metric work, and disabling the stale markers get them to go on for 5m after that.

So maybe the way you graph them has something to do with them? What are the queries you do?

Can you give more information on what you are using for storage? Maybe it is something with that. Also, where do you get the data for the red line?

exec should not drop the env variables from my understanding and testing, but maybe testing it without might help.

adding -v to k6 will print a lot of additional information including something like

DEBU[0060] Stopping the output                           output="Prometheus remote write"
DEBU[0060] Converted samples to Prometheus TimeSeries    nts=17 output="Prometheus remote write"
DEBU[0060] Successful flushed time series to remote write endpoint  nts=17 output="Prometheus remote write" took="685.682µs"
DEBU[0060] Marking time series as stale                  output="Prometheus remote write" staleMarkers=17
DEBU[0060] Output stopped                                output="Prometheus remote write"

which will tell you how many stalemarkers were written.

gabrielgiussi commented 1 year ago

I found the issue, I should have tested before, sorry. I'm running a prometheus as a sidecar next to the container running k6, because the prometheus instance in our observability infra doesn't have the remote-write feature turned on. So I push to the sidecar prometheus and having the "real" prometheus scraping the federated endpoint. The metrics are gone in the prometheus sidecar after k6 finishes, so the flag is working fine.