Closed sarg3nt closed 1 year ago
Hi @sarg3nt, I can answer your second question as I stumbled over this one, too.
testid
is just a (arbitrary) tag. Have a look at the docker-script
Edit: just stumbled over the explanation in the README :)
Hey @sarg3nt, thanks for the feedback, we will use it for future improvements to the documentation.
@jwcastillo can you help with points 3 and 4, please?
yes, I will check it
@Allaman which README did you find that in? I'm not seeing it . . Oh, the root README.md. My bad. I was combing over the dashboards README.md. Since this is built into k6 now I barely glanced at the root README.md
Does anyone know if the testid
needs to be globally unique for every run or just unique for the specific .js
file.
I.e. is the testid
+ time range good enough?
was combing over the dashboards README.md
@sarg3nt How did you find it? We may need to update a link.
Does anyone know if the testid needs to be globally unique for every run or just unique for the specific .js file.
You can use the docker-run.sh for convenience as the README suggests.
@codebien I searched around until I found it in the root https://github.com/grafana/xk6-output-prometheus-remote/blob/main/README.md for the project. I ignored it earlier because we don't use Docker compose. Our prom / grafana are already deployed to internal k8s clusters.
Checked out the script, I basically just finished writing the same thing. :)
I got the testid working but still no P95 response times or Response time in the main graph
Hey @sarg3nt, @jwcastillo submitted a fix in #113. Can you check out the branch and see if it resolves your issues, please?
Trying these dashboards as well (also tested the dashboards commited in #113) and the P95 stat doesn't seem to work for me either. I don't think it's related to the dashboards either, it seems Grafana is not showing any data for the native histogram metrics.
In Prometheus I have the following features enabled: remote-write-receiver, native-histograms When I look in Prometheus the time series seem to be correctly saved in the newer format.
But then in Grafana when I go to discover and search for
histogram_quantile(0.95, rate(k6_http_req_duration_seconds[1m]))
I don't get any results.
Is there perhaps a setting for Grafana that sarg3nt and I overlooked and need for this to be functional?
Update;
As I am playing with it a bit more it seems to be that having rate
in the query doesn't seem to be working with histograms.
This is my first encounter with the new histogram format so I don't know much about them yet.
Below were 2x 1minute k6 runs. (didn't have a testid specified either)
Without rate I get some stats:
With rate I get nothing at all.
Hi @manubell, which versions are you using? Grafana 9.3.6 and Prometheus 2.42.0?
The sum function should be part of the second argument:
histogram_quantile(0.9, sum by (testid) (rate(k6_http_req_duration_seconds[1m])))
I can see the same result in Proemtheus for k6_http_req_duration_seconds
that @manubell shows but when I try to replicate the grapgh in Grafana with sum by(testid) (histogram_quantile(0.95, k6_http_req_duration_seconds))
I'm still not seeing anything.
Hey @sarg3nt, can you report an anonymized k6 script and the exact commands you're running, please? In this way, we should be able to reproduce the issue.
Do you open the Test Result dashboard following the link from the Test List dashboard?
@codebien please see files attached. I'm not including the login.js
file as I doubt it is useful to you, let me know if you think you need it.
run_loadtest.sh
This script takes input and generates a k6 command that would look something like this:
K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true K6_PROMETHEUS_RW_SERVER_URL=https://<MY-PROMETHEUS-URL>/api/v1/write k6 run -e LOADTEST_REALM=<REALM> -e LOADTEST_BASE_URL=<BASE-URL-HERE> -e TESTID=load_bulletins 03/10/23 12:06:23 --tag testid=load_bulletins 03/10/23 12:06:23 -o experimental-prometheus-rw ./load_bulletins.js
#!/usr/bin/bash
set -euo pipefail
IFS=$'\n\t'
#<Colors>
red="\033[1;31m"
yellow="\033[1;33m"
green="\033[1;32m"
blue="\033[1;34m"
nc="\033[0m"
#</Colors>
show_help () {
echo -e "${green}== k6 LoadTester ==${nc}"
echo -e "The run_loadtest.sh script is a wrapper around the k6 loadtesting utility."
echo -e " Note: This tool defaults to using the local test account and does not support SSO."
echo
echo -e "${blue}List available scripts: ${nc}"
echo -e " ./run_loadtest.sh ls"
echo
echo -e "${blue}Show k6 run help: ${nc}"
echo -e " ./run_loadtest.sh <script> -h"
echo
echo -e "${blue}Run loadtest with script on local build: ${nc}"
echo -e " ./run_loadtest.sh [script] [optional=realm]"
echo
echo -e "${blue}Run loadtest with script on remote deployment: ${nc}"
echo -e " ./run_loadtest.sh [script] [optional=realm] [optional=baseUrl]"
echo ""
echo "The following flags can be passed to k6"
cat << EOF
Flags:
-o --out cloud send test results to k6 cloud, you must be logged into a cloud account
-o, --out uri uri for an external metrics database
-a, --address string address for the REST API server (default "localhost:6565")
-c, --config string JSON config file (default "/home/vscode/.config/loadimpact/k6/config.json")
-h, --help help for k6
--log-format string log output format
--log-output string change the output for k6 logs, possible values are stderr,stdout,none,loki[=host:port],file[=./path.fileformat] (default "stderr")
--no-color disable colored output
-q, --quiet disable progress updates
-v, --verbose enable verbose logging
EOF
}
validateTools() {
toolsInstalled=true
# verify k6 is installed
k6Tool=$(which k6)
if [ -z "${k6Tool}" ]; then
echo -e "${red}Error: You do not currently have k6 installed.${nc}"
toolsInstalled=false
fi
if [ "${toolsInstalled}" == false ]; then
echo ""
echo -e "${red}You are missing the required tools to proceed${nc}"
echo ""
exit 1
fi
}
main () {
validateTools
# Set up local vars
local script_dir=""
local root_path=""
script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd -P)
root_path=$(realpath "${script_dir}/../..")
local loadtest_path="${root_path}/deployments/validators/loadtest"
local test=""
local realm="saas"
local baseUrl="${realm}.localhost"
local options=()
local testid=""
# Display help menu
if [[ "${#}" == 0 || "${1}" == "--help" || "${1}" == "-h" ]]; then
show_help
exit 0
fi
# Load the test script from position 1
# We know there is a first element, otherwise the above if would have triggered and we would have shown the help
test="$1"
# Loop through remaining parameters and process them looking for switches as needed.
for (( i=2; i <= "$#"; i++ )); do
# Check if passed arg has a - or -- in it, if so, add it directly to the options array and continue.
if [[ "${!i}" == -* || "${!i}" == --* ]]; then
options=("${options[@]}" "${!i}")
# Look ahead on item in the array if there is one and check if it starts with a - or --
# If it does not, assume it is an argument to the current switch.
# Example: '-o cloud' where above if added the -o and below if adds 'cloud'
((i=i+1))
if (( i <= "$#")) && [[ "${!i}" != -* && "${!i}" != --* ]]; then
options=("${options[@]}" "${!i}")
continue
fi
# The inner if was not true so we need to decrement the counter and continue.
((i=i-1))
continue
fi
# If this is the second item in the param list then we add it as it is the test script name
if (( i == 2 )); then
realm="${!i}" && continue
fi
# If this is the second item in the param list then we add it as it is the realm
if (( i == 3 )); then
baseUrl="${!i}" && continue
fi
done
# Change into working directory
cd "${loadtest_path}" || exit 1
# Provide a list of scripts to run
if [[ "${test}" == "list" || "${test}" == "ls" ]]; then
echo -e "${green}Current Load test scripts:${nc}"
# List files in the loadtest directory
for file in "./"*; do
# skip login script
if [[ "${file:2:-3}" == 'login' || "${file:2:-3}" == 'init' || "${file:2:-3}" == *'template' ]]; then
continue
fi
echo " ${file:2:-3}"
done
exit 0
fi
# Unable to load path
if [[ ! -f "${loadtest_path}/${test}.js" ]]; then
echo -e "${yellow}Unable to load${nc}" \
"${green} ${test} ${nc}"
echo -e "${yellow}Please validate you selected the right script.${nc}"
exit 1
fi
# Build the testid
testid="${test} $(date '+%D %T')"
# Save for later as I might use this to add some values to the grafana graph later
#K6_PROMETHEUS_RW_TREND_STATS='avg,p(90),p(99),min,max'\
# Output the k6 command we are going to run
echo K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true \
K6_PROMETHEUS_RW_SERVER_URL="https://<MY-PROMETHEUS-URL>/api/v1/write" \
k6 run\
-e LOADTEST_REALM="${realm}"\
-e LOADTEST_BASE_URL="${baseUrl}"\
-e TESTID="${testid}"\
--tag testid="${testid}"\
-o experimental-prometheus-rw\
"./${test}.js" "${options[@]}"
# Run k6
# See: https://k6.io/docs/results-output/real-time/grafana-cloud/
# and https://k6.io/docs/results-output/real-time/prometheus-remote-write/
K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true \
K6_PROMETHEUS_RW_SERVER_URL="https://<MY-PROMETHEUS-URL>/api/v1/write" \
k6 run\
-e LOADTEST_REALM="${realm}"\
-e LOADTEST_BASE_URL="${baseUrl}"\
-e TESTID="${testid}"\
--tag testid="${testid}"\
-o experimental-prometheus-rw\
"./${test}.js" "${options[@]}"
}
# Lets begin!
if ! (return 0 2> /dev/null); then
(main "$@")
fi
init.js
contains default values for number of VUs, checks, http_req_failed, etc. along with values needed for authentication and auth caching. The main purpose of this file is an attempt to keep the test scripts as DRY as we can and to codify a standard of expectations for our APIs .
// This file holds common initialization functionality that all tests will use
// including logging in
export function init() {
let lt_settings = {
users: {
normal: 50,
max: 100,
//TODO: Determine what breaking needs to be
breaking: 250,
},
// Will fail the test if the checks defined in the test do not succeed greater than the given percentage of time
checks: ["rate>0.9"],
http_req_failed: [{ threshold: "rate < 0.01", abortOnFail: true }],
http_req_blocked: [{ threshold: "max < 2000", abortOnFail: false }],
http_req_duration: ["p(95) <= 100", "p(99.9) < 1000"],
realm: __ENV.LOADTEST_REALM,
base_url: __ENV.LOADTEST_BASE_URL,
// Should we cache the login token or not. Default is true, false will cause a new login for each test.
cache_token: __ENV.LOADTEST_CACHE_LOGIN_TOKEN,
access_token: "",
expires_on: 0,
};
// Add local login base URL if one was not passed.
// Note: this is here as it's the best place for init code like this at the moment
if (
lt_settings.base_url === "isundefined" ||
lt_settings.base_url === undefined ||
lt_settings.base_url === ""
) {
lt_settings.base_url = `${lt_settings.realm}.localhost`;
}
if (
lt_settings.cache_token === "isundefined" ||
lt_settings.cache_token === undefined ||
lt_settings.cache_token === ""
) {
lt_settings.cache_token = true;
}
return lt_settings;
}
load_bulletins.js
The k6 test script itself. Note that we are importing login.js
but I have not included that file.
import http from "k6/http";
import { login } from "./login.js";
import { init } from "./init.js";
import { sleep, check } from "k6";
let lt_settings = init();
export const options = {
insecureSkipTLSVerify: true,
stages: [
{ duration: "30s", target: lt_settings.users.normal }, // 5m: Simulate ramp-up of traffic from 1 to normal load over 5 minutes
{ duration: "1m", target: lt_settings.users.normal }, // 10m: Stay at the normal load for 10 minutes
{ duration: "30s", target: 0 }, // 5m: Ramp-down to 0 users over 5 minutes.
],
thresholds: {
checks: lt_settings.checks,
http_req_failed: lt_settings.http_req_failed,
http_req_blocked: lt_settings.http_req_blocked,
http_req_duration: lt_settings.http_req_duration,
},
};
export function setup() {
// Set the start time in milliseconds to 15 seconds before actual start time to give some padding left on the graph
let start_time = new Date().getTime() - 15000;
return start_time;
}
export default function () {
lt_settings = login({ lt_settings });
const result = http.get(`https://<SOME-URL-TO-TEST>`, {
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${lt_settings.access_token}`,
tenant: lt_settings.realm,
},
});
check(result, {
"status is 200": (r) => r.status === 200,
"has data": (r) => JSON.parse(r.body).length > 0,
"api response text contains <SOME VALUE>": (r) => r.body.includes("<SOME VALUE>"),
});
sleep(1); // How many requests per second, scale by adding users
}
export function teardown(start_time) {
// Set the end_time to actual end time + 90 seconds to give some padding right on the graph and
// account for in flight requests to finish
let end_time = Date.now() + 90000;
let testid = encodeURIComponent(__ENV.TESTID);
// Create a link for the user to click on to make viewing test results easy.
console.log(
`Test results can be viewed in Grafana here: https://<URL TO GRAFANA>/dashboard/d/01npcT44k/test-result?from=${start_time}&to=${end_time}&var-DS_PROMETHEUS=Prometheus-thanos&var-testid=${testid}&var-scenario=All&var-url=All&var-metrics=k6_http_req_waiting_seconds`
);
}
Hey @sarg3nt, I don't see any related issue skimming your code, and running it I got the expected value for the end-of-test summary.
Versions used:
k6 v0.43.1 (2023-02-27T10:53:03+0000/v0.43.1-0-gaf3a0b89, go1.19.6, linux/amd64)
grafana 9.3.6
prometheus v2.42.0
Do you open the Test Result dashboard following the link from the Test List dashboard?
You haven't answered my question, this still sounds like the reason. If you open the Test Result
dashboard directly then you will probably get the wrong time frame and the query will return a wrong number.
Two more checks to do:
k6_http_req_duration_seconds
from Grafana's Metric explorer?@codebien sorry for missing the result dashboard question. Yes, I have tried from there as well with no luck.
I can see k6_http_req_duration_seconds
data in Prometheus and can see k6_http_req_duration_seconds
as a value in the Grafana explorer but it never returns data.
Please keep in mind I'm very new the Prometheus / Grafana stack.
Here's output in Prometheus:
But no output in Grafana:
Are there some logs or setup I can get you to help figure this out?
@codebien We got it fixed (mostly). Our tech that manages our Prometheus / Grafana deployment has found there is a disconnect happening between Grafana and Thanos where Thanos was not returning data for k6_http_req_duration_seconds
. He's looking into that now but in the meantime gave us a datasource directly to Prometheus.
I can now see the P95 response times as expected using https://github.com/grafana/xk6-output-prometheus-remote/pull/113
A few oddities to mention:
The above screen cap was from the link I create in the test, when I use the Test List it does basically the same thing but with a wider time span. Graphs are still cut off to the left
As for points 3 and 6, as well as the behavior of the graph appearing cut off on the left, we will wait for help from @codebien for a more precise answer. In the meantime, please feel free to communicate any other concerns or issues.
@jwcastillo and @codebien I've got some more oddities to report. The Requests by URL table seems to be reporting incorrect data. Here's the output from k6 for the test in question.
Command line to run this test using 'k6':
K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true K6_PROMETHEUS_RW_SERVER_URL=https://<redacted>/api/v1/write k6 run -e LOADTEST_REALM=<redacted> -e LOADTEST_BASE_URL="<redacted>" -e LOADTEST_TYPE=load -e LOADTEST_SUB_URL="api/bulletins/bulletins?serialNumber=" -e LOADTEST_CONTENT_TYPE="application/json; charset=utf-8" -e LOADTEST_EXPECTED_TEXT="2000.13" -e TESTID="Bulletins List 03/29/23 14:15:45" --tag testid="Bulletins List 03/29/23 14:15:45" -o experimental-prometheus-rw ./api.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: ./api.js
output: Prometheus remote write (<redacted>)
scenarios: (100.00%) 1 scenario, 50 max VUs, 20m30s max duration (incl. graceful stop):
* default: Up to 50 looping VUs for 20m0s over 3 stages (gracefulRampDown: 30s, gracefulStop: 30s)
INFO[0000] Performaing a load test source=console
INFO[1202] Grafana: https://<redacted>/dashboard/d/01npcT44k/test-result?from=1680124530752&to=1680125837607&var-testid=Bulletins%20List%2003%2F29%2F23%2014%3A15%3A45&var-scenario=All&var-url=All&var-metrics=k6_http_req_duration_seconds source=console
running (20m01.9s), 00/50 VUs, 3189 complete and 0 interrupted iterations
default ✓ [======================================] 00/50 VUs 20m0s
✓ Status is 200
✓ Has correct content type
✓ Has expected value
█ setup
█ teardown
✓ checks.........................: 100.00% ✓ 9567 ✗ 0
data_received..................: 1.3 GB 1.0 MB/s
data_sent......................: 4.6 MB 3.8 kB/s
✓ http_req_blocked...............: avg=3.42ms min=170ns med=210ns max=1.2s p(90)=300ns p(95)=370ns
http_req_connecting............: avg=876.45µs min=0s med=0s max=1.05s p(90)=0s p(95)=0s
✗ http_req_duration..............: avg=12.55s min=99.62ms med=13.33s max=29.8s p(90)=22.02s p(95)=23.6s
{ expected_response:true }...: avg=12.55s min=99.62ms med=13.33s max=29.8s p(90)=22.02s p(95)=23.6s
✓ http_req_failed................: 0.00% ✓ 0 ✗ 3360
http_req_receiving.............: avg=371.11ms min=39.57µs med=297.42ms max=4.81s p(90)=525.64ms p(95)=718.89ms
http_req_sending...............: avg=59.5µs min=24.37µs med=58.43µs max=184.42µs p(90)=75.08µs p(95)=81.05µs
http_req_tls_handshaking.......: avg=1.92ms min=0s med=0s max=1.12s p(90)=0s p(95)=0s
http_req_waiting...............: avg=12.18s min=96.49ms med=12.95s max=29.46s p(90)=21.66s p(95)=23.11s
http_reqs......................: 3360 2.795675/s
iteration_duration.............: avg=14.22s min=67.65µs med=15.15s max=30.8s p(90)=23.15s p(95)=24.69s
iterations.....................: 3189 2.653395/s
vus............................: 1 min=1 max=50
vus_max........................: 50 min=50 max=50
ERRO[1202] some thresholds have failed
However the "Requests by URL" table reports the following: The top URL is the primary with the second being auth, sorry for the redacted info. None of the other times match on the main graph and P95 output as well. k6 output says P95 was 23.6 where Grafana says 26.3 The above screenshots are from the link I generate in the k6 output, if I use the link in the "Test List" I get the same results just zoomed out somewhat on the graph
Hello! Glad I've found this thread. Do you know why the Active VU is shorter than request rate and response time?
Hey @jwcastillo, sorry for nudging you. Please tell me if there's a better place to ask questions.
Okay. I think I got it. It corresponds to the range I use in the rate
function.
Hey there! 🌟 Good news! We have updated the previous dashboard and created a new dashboard that supports the option without histogram metrics. You can check them out in the dashboard directory or with the docker-compose example of this repo:
Both dashboards share the same design. The only differences are in the PromQL queries and Trend Metric Query variable.
I encourage you to start using them. If you stumble on any issues or have any suggestions—let us know! Gonna close this issue now since it’s about the previous version.
I've been strugeling for a couple of days to get the provided Grafana dashboard dashboard-results.json to function fully. After reading through the closed issues I've finally discovered I need to pass
K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true
to k6 but that hasn't solved it for me. I do have the correct version of Prometheus and have--enable-feature=native-histograms
set. I'm receiving no errors in the Grafana dashboard now so I think everything is set up as it should be.We are running Grafana 9.3.6 and Prometheus 2.42.0
Here's my thoughts / questions in a nice list:
testid
? Nothing ever appears in that list and I can't find it anywhere in the k6 docs. All the queries seem to rely on it but I don't know what it is or where it comes from or if there is something I should be setting.See below for a screenshot of my dashboard