Closed jszwedko closed 2 years ago
I'll triage this at least.
Hello! I have been watching the vector for the last 24 hours. You were right from the beginning that, the scrape time was increasing relative to the number of time series. This can be seen in the screenshot from Grafana:
I took a look at this today and was able to reproduce increasing scrape times, but not nearly at the magnitude shown here. After processing around 2 GB of data with ~14k timeseries, I was still seeing scrape times of around ~.2s. They were very gradually increasing though, as more data was processed, so there might be something there to investigate.
@Denissa89 is it possible to share a sample of the data going into Vector as well as one of the scrapes output? I'm wondering if there is something I'm missing about the its profile. Also, I noticed you are using an old nightly version of Vector. Could you try with the recently released 0.13.0?
How I reproduced:
Vector config:
sources:
nginx_input_vector:
type: socket
mode: tcp
address: "0.0.0.0:8080"
internal_metrics:
type: internal_metrics
transforms:
nginx_parse_json:
inputs:
- nginx_input_vector
type: remap
source: |
. = parse_json!(.message)
nginx_parse_remap:
inputs:
- nginx_parse_json
type: remap
source: |
if !match(.remote_user, r'^(ATG|B2C|CRM|FOBO|TS|BTX|RTD|Magnolia)$') {
.remote_user = "other"
}
del(.file)
del(.host)
del(.source_type)
.request_uri = replace(string!(.request_uri), r'\d{16}', "xxx")
.request_time = to_float!(.request_time)
.status = to_int!(.status)
nginx_http_metrics:
type: log_to_metric
inputs:
- nginx_parse_remap
metrics:
- type: counter
field: status
name: http_response_count_total
namespace: "${HTTP_METRICS_NAMESPACE}"
tags:
host: "${HOSTNAME}"
remote_user: '{{ remote_user }}'
request_uri: '{{ request_uri }}'
status: '{{ status }}'
- type: histogram
field: request_time
name: http_response_duration_seconds
namespace: "${HTTP_METRICS_NAMESPACE}"
tags:
host: "${HOSTNAME}"
remote_user: '{{ remote_user }}'
request_uri: '{{ request_uri }}'
status: '{{ status }}'
- type: gauge
field: request_time
name: http_response_duration_seconds
namespace: "${HTTP_METRICS_NAMESPACE}"
tags:
host: "${HOSTNAME}"
remote_user: '{{ remote_user }}'
request_uri: '{{ request_uri }}'
status: '{{ status }}'
sinks:
nginx_output_prometheus:
address: '0.0.0.0:9598'
inputs:
- internal_metrics
- nginx_http_metrics
type: prometheus_exporter
default_namespace: vector
quantiles:
- 0.5
- 0.75
- 0.9
- 0.95
- 0.99
I generated fake lines using this script:
#!/bin/bash
remote_users=("ATG" "B2C" "CRM" "FOBO" "TS" "BTX" "RTD" "Magnolia" "other")
statuses=(200 400 500 404 503)
while : ; do
remote_user=${remote_users[$RANDOM % ${#remote_users[@]} ]}
status=${statuses[$RANDOM % ${#statuses[@]} ]}
request_time=$RANDOM
request_uri="somepath/$(($RANDOM % 20))"
echo "{ \"remote_user\": \"$remote_user\", \"status\": \"$status\", \"request_time\": \"$request_time\", \"request_uri\": \"$request_uri\"}"
done
And sent it through netcat to vector:
/tmp/generate.sh | netcat localhost 8080
Using version:
vector 0.13.0 (v0.13.0 x86_64-apple-darwin 2021-04-21)
I will update Vector to the new version soon.
I attach the scrapes output below vector_scrape.txt
Hi @Denissa89 . I was just wondering if you had a chance to try out the newer vector version and if you noticed any difference in behavior.
Is this issue actual to the newest Vector version?
Yeah, good question. I'll close this as stale, but if anyone is still observing this feel free to comment or open a new issue.
Reported by user in discord: https://discord.com/channels/742820443487993987/746070591097798688/834105139220185168
They are observing increasing scrape times relative to the length of time Vector has been running:
The number of time series being exported seems to vary between 13k and 19k. I thought maybe the scrape time was increasing relative to the number of time series, but they reported that that wasn't the case.
Vector Version
Vector Configuration File
Expected Behavior
Actual Behavior
Example Data
https://pastebin.com/xhzNcW0w
Additional Context