Open decimalst opened 2 days ago
I can't tell, but I think this is actually intended behavior, based on this footnote in the docs.
Aggregation Behavior
Metrics are aggregated based on their kind. During an interval, incremental metrics are “added” and newer absolute metrics replace older ones in the same series. This results in a reduction of volume and less granularity, while maintaining numerical correctness. As an example, two incremental counter metrics with values 10 and 13 processed by the transform during a period would be aggregated into a single incremental counter with a value of 23. Two absolute gauge metrics with values 93 and 95 would result in a single absolute gauge with the value of 95. More complex types like distribution, histogram, set, and summary behave similarly with incremental values being combined in a manner that makes sense based on their type.
If I convert the gauge metric I am trying to aggregate to 'incremental' rather than 'absolute', this outputs the 1000 we were expecting.
Hi @decimalst,
Setting aside the incremental
vs absolute
for a second the following sounds like a bug:
In my console logs sink, the aggregation with count appears to work, and shows a value of 1000, equaling my test suite.
For a given timestamp, do you see different output on the Vector console vs DD metrics?
Generally, metrics can be either absolute or incremental. Absolute metrics represent a "last write overwrites" scenario, where the latest absolute value seen becomes the actual metric value. On the other hand, incremental metrics are additive. The current total value of the metric is adjusted.
Also, we provide the https://vector.dev/docs/reference/configuration/global-options/#expire_metrics_secs global option as way to remove all metrics that have not been updated in the given number of seconds.
Hey @pront, thanks for your response. I can't tell, but it seems like the count aggregation isn't counting in the way I'd expect for absolute gauges. I put together a test script and a configuration that demonstrates it not working. Here's the json sink of the logs:
[999 more entries of this, trimmed for brevity]
{"name":"test_metrics_count","tags":{"host":"host2","pod_name":"vector-5695898575-999xb","region":"us-west","type":"timeout_count"},"kind":"absolute","gauge":{"value":1.0}}
{"name":"test_metrics_count_renamed","tags":{"host":"host2","pod_name":"vector-5695898575-999xb","region":"us-west","type":"timeout_count"},"kind":"absolute","counter":{"value":1000.0}}
customConfig:
data_dir: /vector-data-dir
sources:
influx_http:
path: "/write"
response_code: 204
type: http_server
address: 0.0.0.0:8086
method: POST
decoding:
codec: "influxdb"
internal_metrics:
type: internal_metrics
influx_http_query:
path: "/query"
response_code: 200
type: http_server
address: 0.0.0.0:8087
method: POST
encoding: text
transforms:
add_pod_metadata:
type: remap
inputs: ["filter_some_metrics"]
source: |
# Add pod name from env variable
.tags.pod_name = get_env_var!("POD_NAME")
route_metrics:
type: route
inputs: ["add_pod_metadata"]
route:
aggregate_incremental_gauges:
type: vrl
source: '.name == "test_metrics_count" && .tags.type == "timeout_count"'
transformed_gauges:
type: remap
inputs:
- route_metrics.aggregate_incremental_gauges
source: |
if .name == "test_metrics_count" {
if .tags.type == "timeout_count"{
.name = "test_metrics_count_renamed"
}
}
aggregate_incremental_gauges:
type: aggregate
inputs:
- transformed_gauges
mode: Count
interval_ms: 10000
filter_some_metrics:
type: remap
inputs:
- influx_http
source: |
#this filters some stuff, but not relevant here
true == true
sinks:
datadog-metrics:
tls:
enabled: true
type: datadog_metrics
default_api_key: apikeyhere
inputs: ["route_metrics._unmatched", "aggregate_incremental_gauges"]
console:
encoding:
codec: json
inputs:
- route_metrics.aggregate_incremental_gauges
- influx_http_query
- aggregate_incremental_gauges
type: console
Then I just curl 1000 times:
$ for i in {1..1000}; do
(
curl -i -XPOST 'http://influxdb-vector.k8surl.com/write' \
--data-binary 'test_metrics,host=[testhost](testhost),region=us-west count=1' \
> /dev/null 2>&1
) &
done
In Datadog, I don't see a value for the test_metrics_count_renamed metric reported at all. This is the change to the config which works and allows me to aggregate a count(I also had to add a pod tag because we have multiple replicas of vector listening):
transformed_gauges:
type: remap
inputs:
- route_metrics.aggregate_incremental_gauges
source: |
if .name == "test_metrics_count" {
if .tags.type == "timeout_count"{
.kind = "incremental"
.tags.pod_name = get_env_var!("POD_NAME")
}
}
A note for the community
Problem
Hi, I have a gauge metric which is reported multiple times per second via influxDB http that I am trying to aggregate the count for.
In my console logs sink, the aggregation with count appears to work, and shows a value of 1000, equaling my test suite. However, when I view the metric in Datadog, it shows a reported value of 0.
Configuration
Version
0.42.0
Debug Output
Example Data
I tested with a couple of scripts: here is a bash example:
Same behavior happens with a python script using aiohttp.
Additional Context
Vector is running on Kubernetes.
References
No response