Closed e-vasilyev closed 1 year ago
Hi @e-vasilyev !
Could you provide a complete config that demonstrates this issue? Could it be the case that there are actually an increasing number of internal metrics causing that source to receive an ever increasing number of events? You could look at https://vector.dev/docs/reference/configuration/sources/internal_metrics/#internal_metrics_cardinality_total to determine that.
Hi @jszwedko!
internal_metrics_cardinality has small values:
Full config:
acknowledgements:
enabled: false
data_dir: /var/lib/vector
api:
enabled: true
address: 0.0.0.0:8686
sources:
fluent:
type: fluent
address: 0.0.0.0:24224
vector_metrics:
type: internal_metrics
scrape_interval_secs: 5
vector_logs:
type: internal_logs
transforms:
logs_json_route:
type: route
inputs:
- fluent
route:
json: .tag == "json"
logs_json_prepare:
type: remap
inputs:
- logs_json_route.json
source: |-
del(.@version)
del(.level_value)
del(.tag)
if exists(.@timestamp) {
.timestamp = to_timestamp!(del(.@timestamp), unit: "milliseconds")
}
message_type_route:
type: route
inputs:
- logs_json_prepare
route:
rq_rs: exists(.messageType) && includes(["request", "response"], .messageType)
scl: exists(.messageType) && includes(["scl"], .messageType)
audit_prepare:
type: remap
inputs:
- message_type_route.rq_rs
- message_type_route.scl
source: |-
allowFields = {
"logger": .logger,
"timestamp": .timestamp,
"level": .level,
"message": .message,
"requestId": .requestId,
"subRequestId": .subRequestId,
"messageType": .messageType,
"customerId": .customerId,
"customerOgrn": .customerOgrn,
"queryMnemonic": .queryMnemonic,
"subRequestId": .subRequestId
}
. = compact(allowFields, string: false)
audit_kafka_filter:
type: filter
inputs:
- audit_prepare
condition:
type: "vrl"
source: |-
exists(.messageType) && includes(["request", "response"], .messageType)
audit_kafka:
type: remap
inputs:
- audit_kafka_filter
source: |-
.message = parse_json(.message) ?? .message
audit_clickhouse:
type: remap
inputs:
- audit_prepare
source: |-
.timestamp = to_unix_timestamp!(.timestamp, unit: "seconds")
scl_message:
type: remap
inputs:
- message_type_route.scl
source: |-
. = parse_json!(.message)
vector_logs_prepare:
type: remap
inputs:
- vector_logs
source: |-
.level = del(.metadata.level)
.serviceName = "vector"
.podName, _ = get_env_var("POD_NAME")
del(.pid)
sinks:
loki_sink:
type: loki
inputs:
- message_type_route._unmatched
- vector_logs_prepare
endpoint: http://loki.dtm-infra-dev:3100
remove_label_fields: true
labels:
environment: "test"
serviceName: "{{ serviceName }}"
host: "{{ host }}"
level: "{{ level }}"
source_type: "{{ source_type }}"
compression: gzip
encoding:
codec: json
out_of_order_action: accept
batch:
max_events: 100
timeout_secs: 3
buffer:
max_size: 536870912
type: disk
when_full: block
elasticsearch_sink:
type: elasticsearch
inputs:
- message_type_route._unmatched
api_version: v7
bulk:
action: index
index: "dtm-test-%Y.%m.%d"
batch:
max_events: 100
timeout_secs: 3
buffer:
max_size: 536870912
type: disk
when_full: block
compression: gzip
endpoints:
- http://elastic.podd-ts:9200
audit_kafka_sink:
type: kafka
inputs:
- audit_kafka
bootstrap_servers: "kafka-0.kafka-headless:9092"
librdkafka_options:
message.max.bytes: "10000000"
topic: "audit.logs"
compression: "gzip"
encoding:
codec: json
buffer:
max_size: 536870912
type: disk
when_full: drop_newest
clickhouse_sink:
type: clickhouse
inputs:
- audit_clickhouse
database: "test"
endpoint: "http://clickhouse.dtm-infra-dev:8123"
table: logs
compression: gzip
batch:
max_events: 50
timeout_secs: 2
buffer:
max_size: 536870912
type: disk
when_full: drop_newest
skip_unknown_fields: true
podd_agent_sink:
type: kafka
inputs:
- scl_message
bootstrap_servers: "kafka-0.kafka-headless:9092"
topic: "demo_view.scl.signal"
acknowledgements: true
compression: gzip
encoding:
codec: json
buffer:
max_size: 536870912
type: disk
when_full: block
prometheus_sink:
type: prometheus_exporter
address: "0.0.0.0:9598"
inputs:
- vector_metrics
All metrics for component_sent_events_total
Hi @e-vasilyev !
Can you show the graph of the internal metric cardinality total rather than rate? Your graph of the rate actually makes it look like it might be growing, but graphing the total will show that better.
Hi @jszwedko vector_internal_metrics_cardinality_total:
Thanks @e-vasilyev ! This issue looks like it is likely to be caused by the lack of #15426 . As a workaround you could try configuring metrics expiry: https://vector.dev/docs/reference/configuration/global-options/#expire_metrics_secs
Hi @jszwedko. Thank you! As a workaround, it works.
👍 I'll close this issue as a duplicate of https://github.com/vectordotdev/vector/issues/15426. You can follow along on that issue for any updates. Thanks for the discussion!
A note for the community
Problem
The value of the component_received_events_total metric for the vector_metrics component is constantly growing.
I am using query: irate (vector_component_received_events_total{component_kind="source", component_name="vector_metrics", namespace="dtm-dev"}[15s])
Other components do not have this problem.
Configuration
Version
vector 0.31.0 (x86_64-unknown-linux-musl 0f13b22 2023-07-06 13:52:34.591204470)
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response