Open gord1anknot opened 4 days ago
Pinging code owners:
processor/transform: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley
See Adding Labels via Comments if you do not have permissions to add labels yourself.
@gord1anknot can you post pictures of the profiles graphs? Something like https://pprof.me/ is an easy way.
Certainly, a flame graph felt like it would be less useful, so I made this graph of in use memory space
Here is in-use objects
This specific pod I pulled the pprof from hasn't reached maximum memory (yet), but it's at 90% and climbing.
Component(s)
processor/transform
What happened?
Description
Hello! My organization has a helm deployment of opentelemetry collector, and we are seeing what I would describe as a memory leak with one particular daemonset tasked with ingesting prometheus, kubelet, and host metrics from it's node. We have worked around this issue by periodically restarting this workload.
The memory usage comes on very gradually; it takes about two weeks to build up, at which point CPU usage maxes out from a constant loop of garbage collection. At that point, metrics are refused due to this contention.
On August 2nd, we tried splitting the configuration into two daemonsets to isolate log forwarding from metrics when it reaches this condition. The log forwarding configuration does not have this problem.
We observed this issue both before an upgrade to
0.107.0
from0.92.0
and after a rollback back to0.92.0
to confirm that the memory issue was unrelated to the upgrade.I suspect but do not know that this issue comes out of our use of the transform processor, which is why I labeled the component that way. The reason I suspect is because we expanded our usage of it greatly on about July 13th, and the chart I believe shows that the memory issue rises to a problem level faster after this date.
Please see the chart below, going back to May 1st, for a visual on memory usage of our opentelemetry workloads. The
cluster-reciever
is a singleton pod for k8s cluster metrics and some high memory scrapes,logs-agent
is the split logs configuration, andcollector
is a gateway, but do not have issues.promql query for the chart seen below
```promql max by (k8s_container_name, k8s_workload_name) ( ( max by (k8s_workload_name, k8s_container_name, k8s_pod_name) ( container_memory_usage{env="prod", k8s_cluster_name=~"prod", k8s_namespace_name=~"opentelemetry-collector", k8s_workload_name=~".*"} ) ) / ( ( max by (k8s_workload_name, k8s_container_name, k8s_pod_name) ( k8s_container_memory_limit{env="prod", k8s_cluster_name=~"prod", k8s_namespace_name=~"opentelemetry-collector", k8s_workload_name=~".*"} ) ) != 0 ) ) ```
Steps to Reproduce
We are able to reproduce this issue in lower environments, however, since the issue takes at least 14 days to show up, we cannot iterate very quickly here. Please find a complete configuration for the
metrics-agent
daemonset below.Details
```yaml exporters: debug: {} logging: {} otlphttp: endpoint: << EXAMPLE >> splunk_hec/platform_logs: disable_compression: true endpoint: << EXAMPLE >> idle_conn_timeout: 10s index: kubernetes-logging profiling_data_enabled: false retry_on_failure: enabled: true initial_interval: 5s max_elapsed_time: 300s max_interval: 30s sending_queue: enabled: true num_consumers: 10 queue_size: 5000 source: kubernetes splunk_app_name: otel-collector-agent splunk_app_version: 0.78.0 timeout: 10s tls: insecure_skip_verify: true token: ${env:SPLUNK_HEC_TOKEN} extensions: health_check: endpoint: ${env:MY_POD_IP}:13133 zpages: {} processors: batch: {} filter/logs: logs: exclude: match_type: strict resource_attributes: - key: splunk.com/exclude value: "true" filter/metrics: error_mode: ignore metrics: datapoint: - resource.attributes["k8s.namespace.name"] == "cluster-overprovisioner" - resource.attributes["k8s.namespace.name"] == "cluster-scaler" - resource.attributes["k8s.namespace.name"] == "kube-system" - resource.attributes["k8s.container.name"] == "pause" - resource.attributes["k8s.container.name"] == "wait" metric: - name == "dapr_http_client_roundtrip_latency" - name == "dapr_component_pubsub_ingress_latencies" - name == "kubernetes.daemon_set.current_scheduled" - name == "kubernetes.daemon_set.misscheduled" - name == "kubernetes.daemon_set.updated" - name == "kubernetes.deployment.updated" - name == "kubernetes.job.parallelism" - name == "kubernetes.namespace_phase" - name == "kubernetes.stateful_set.updated" - IsMatch(name, "kubernetes.replica_set.*") - IsMatch(name, "kubernetes.replication_controller.*") - IsMatch(name, "kubernetes.resource_quota.*") - IsMatch(name, "openshift.*") k8sattributes: extract: annotations: - from: pod key: splunk.com/sourcetype - from: namespace key: splunk.com/exclude tag_name: splunk.com/exclude - from: pod key: splunk.com/exclude tag_name: splunk.com/exclude - from: namespace key: splunk.com/index tag_name: com.splunk.index - from: pod key: splunk.com/index tag_name: com.splunk.index - from: pod key: examplecompany.net/env tag_name: env - from: pod key: examplecompany.net/role tag_name: role - from: pod key: examplecompany.net/service tag_name: service - from: pod key: examplecompany.net/app tag_name: app - from: pod key: examplecompany.net/version tag_name: version - from: pod key: examplecompany.net/canary tag_name: canary labels: - from: pod key: examplecompany.net/env tag_name: env - from: pod key: examplecompany.net/role tag_name: role - from: pod key: examplecompany.net/service tag_name: service - from: pod key: examplecompany.net/app tag_name: app - from: pod key: examplecompany.net/version tag_name: version - from: pod key: examplecompany.net/canary tag_name: canary metadata: - k8s.cronjob.name - k8s.daemonset.name - k8s.deployment.name - k8s.replicaset.name - k8s.statefulset.name - k8s.job.name - k8s.namespace.name - k8s.node.name - k8s.pod.name - k8s.pod.uid - container.id - container.image.name - container.image.tag filter: node_from_env_var: K8S_NODE_NAME passthrough: false pod_association: - sources: - from: resource_attribute name: k8s.pod.uid - sources: - from: resource_attribute name: k8s.pod.ip - sources: - from: resource_attribute name: ip - sources: - from: connection - sources: - from: resource_attribute name: host.name k8sattributes/prometheus: extract: metadata: - k8s.cronjob.name - k8s.daemonset.name - k8s.deployment.name - k8s.replicaset.name - k8s.statefulset.name - k8s.job.name - k8s.namespace.name - k8s.node.name - k8s.pod.name - k8s.pod.uid - container.id filter: node_from_env_var: K8S_NODE_NAME passthrough: false pod_association: - sources: - from: resource_attribute name: k8s.pod.uid - sources: - from: resource_attribute name: k8s.pod.ip - sources: - from: resource_attribute name: ip - sources: - from: connection - sources: - from: resource_attribute name: host.name memory_limiter: check_interval: 2s limit_percentage: 80 spike_limit_percentage: 25 resource: attributes: - action: insert key: k8s.node.name value: ${env:K8S_NODE_NAME} - action: upsert key: k8s.cluster.name value: prod resource/add_agent_k8s: attributes: - action: insert key: k8s.pod.name value: ${env:K8S_POD_NAME} - action: insert key: k8s.pod.uid value: ${env:K8S_POD_UID} - action: insert key: k8s.namespace.name value: ${env:K8S_NAMESPACE} resource/add_environment: attributes: - action: insert key: env value: prod resource/chrono-instance-id: attributes: - action: insert from_attribute: container.id key: service.instance.id - action: insert from_attribute: k8s.pod.uid key: service.instance.id - action: insert from_attribute: host.name key: service.instance.id resource/logs: attributes: - action: upsert from_attribute: k8s.pod.annotations.splunk.com/sourcetype key: com.splunk.sourcetype - action: upsert from_attribute: k8s.pod.annotations.examplecompany.net/role key: role - action: upsert from_attribute: k8s.pod.annotations.examplecompany.net/service key: service - action: delete key: k8s.pod.annotations.splunk.com/sourcetype - action: delete key: splunk.com/exclude resourcedetection: detectors: - env - gcp - system override: true timeout: 10s transform/add_service_and_role: error_mode: ignore metric_statements: - context: datapoint statements: - set(attributes["service"], resource.attributes["service.name"]) where attributes["service"] == nil - set(attributes["service"], resource.attributes["service"]) where attributes["service"] == nil - set(attributes["role"], resource.attributes["role"]) where attributes["role"] == nil - set(attributes["k8s.pod.name"], resource.attributes["k8s.pod.name"]) where resource.attributes["k8s.pod.name"] != nil - set(attributes["k8s.node.name"], resource.attributes["k8s.node.name"]) where resource.attributes["k8s.node.name"] != nil - set(attributes["k8s.cluster.name"], resource.attributes["k8s.cluster.name"]) where resource.attributes["k8s.cluster.name"] != nil - set(attributes["k8s.container.name"], resource.attributes["k8s.container.name"]) where resource.attributes["k8s.container.name"] != nil - set(attributes["k8s.workload.name"], resource.attributes["k8s.deployment.name"]) where resource.attributes["k8s.deployment.name"] != nil - set(attributes["k8s.workload.name"], resource.attributes["k8s.daemonset.name"]) where resource.attributes["k8s.daemonset.name"] != nil - set(attributes["k8s.workload.name"], resource.attributes["k8s.statefulset.name"]) where resource.attributes["k8s.statefulset.name"] != nil - set(attributes["k8s.workload.name"], resource.attributes["k8s.cronjob.name"]) where resource.attributes["k8s.cronjob.name"] != nil - set(attributes["k8s.workload.name"], resource.attributes["k8s.replicaset.name"]) where resource.attributes["k8s.replicaset.name"] != nil and resource.attributes["k8s.deployment.name"] == nil - | set(attributes["k8s.workload.name"], resource.attributes["k8s.job.name"]) where resource.attributes["k8s.job.name"] != nil and resource.attributes["k8s.cronjob.name"] == nil - set(attributes["k8s.workload.kind"], "deployment") where resource.attributes["k8s.deployment.name"] != nil - set(attributes["k8s.workload.kind"], "daemonset") where resource.attributes["k8s.daemonset.name"] != nil - set(attributes["k8s.workload.kind"], "statefulset") where resource.attributes["k8s.statefulset.name"] != nil - set(attributes["k8s.workload.kind"], "cronjob") where resource.attributes["k8s.cronjob.name"] != nil - | set(attributes["k8s.workload.kind"], "replicaset") where resource.attributes["k8s.replicaset.name"] != nil and resource.attributes["k8s.deployment.name"] == nil - | set(attributes["k8s.workload.kind"], "job") where resource.attributes["k8s.job.name"] != nil and resource.attributes["k8s.cronjob.name"] == nil - set(attributes["k8s.namespace.name"], resource.attributes["k8s.namespace.name"]) where resource.attributes["k8s.namespace.name"] != nil - set(attributes["k8s.workload.name"], attributes["app"]) where attributes["app"] != nil and attributes["k8s.workload.kind"] == "replicaset" - set(attributes["k8s.workload.name"], resource.attributes["app"]) where resource.attributes["app"] != nil and attributes["k8s.workload.kind"] == "replicaset" - set(attributes["k8s.workload.name"], Concat([attributes["app"],attributes["role"]], "-")) where attributes["app"] != nil and attributes["role"] != nil and attributes["k8s.workload.kind"] == "replicaset" - set(attributes["k8s.workload.name"], Concat([resource.attributes["app"],resource.attributes["role"]], "-")) where resource.attributes["app"] != nil and resource.attributes["role"] != nil and attributes["k8s.workload.kind"] == "replicaset" transform/sum_histograms: error_mode: ignore metric_statements: - context: metric statements: - extract_sum_metric(true) where name == "dapr_http_client_roundtrip_latency" - extract_sum_metric(true) where name == "dapr_component_pubsub_ingress_latencies" receivers: hostmetrics: collection_interval: 60s root_path: /hostfs scrapers: cpu: null disk: null filesystem: null load: null memory: null network: null paging: null processes: null jaeger: protocols: grpc: endpoint: ${env:MY_POD_IP}:14250 thrift_compact: endpoint: ${env:MY_POD_IP}:6831 thrift_http: endpoint: ${env:MY_POD_IP}:14268 kubeletstats: auth_type: serviceAccount collection_interval: 60s endpoint: ${env:K8S_NODE_IP}:10250 extra_metadata_labels: - container.id metric_groups: - container - node - pod metrics: k8s.container.cpu_limit_utilization: enabled: true k8s.container.cpu_request_utilization: enabled: true k8s.container.memory_limit_utilization: enabled: true k8s.container.memory_request_utilization: enabled: true otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 prometheus: config: scrape_configs: - enable_http2: true follow_redirects: true job_name: external-dns kubernetes_sd_configs: - namespaces: names: - external-dns role: pod metrics_path: /metrics relabel_configs: - action: keep regex: external-dns source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name - action: keep regex: ${env:K8S_NODE_NAME} source_labels: - __meta_kubernetes_pod_node_name scheme: http scrape_interval: 1m scrape_timeout: 10s - enable_http2: true follow_redirects: true job_name: daprd kubernetes_sd_configs: - role: pod metrics_path: /metrics relabel_configs: - action: keep regex: "true" source_labels: - __meta_kubernetes_pod_annotation_dapr_io_enable_metrics - action: keep regex: dapr-metrics source_labels: - __meta_kubernetes_pod_container_port_name - action: keep regex: ${env:K8S_NODE_NAME} source_labels: - __meta_kubernetes_pod_node_name scheme: http scrape_interval: 1m scrape_timeout: 30s - enable_http2: true follow_redirects: true job_name: envoy kubernetes_sd_configs: - role: pod metrics_path: /stats/prometheus relabel_configs: - action: keep regex: envoy source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name - action: keep regex: envoy source_labels: - __meta_kubernetes_pod_container_name - action: keep regex: http-admin source_labels: - __meta_kubernetes_pod_container_port_name - action: keep regex: ${env:K8S_NODE_NAME} source_labels: - __meta_kubernetes_pod_node_name scheme: http scrape_interval: 1m scrape_timeout: 30s - enable_http2: true follow_redirects: true job_name: custom-metrics kubernetes_sd_configs: - namespaces: names: - custom-metrics role: pod metrics_path: /metrics relabel_configs: - action: keep regex: example source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name - action: keep regex: ${env:K8S_NODE_NAME} source_labels: - __meta_kubernetes_pod_node_name - action: labelmap regex: __meta_kubernetes_pod_annotation_examplecompany_net_(.+) scheme: http scrape_interval: 1m scrape_timeout: 30s - enable_http2: true follow_redirects: true job_name: keda-operator kubernetes_sd_configs: - namespaces: names: - keda role: pod metrics_path: /metrics relabel_configs: - action: keep regex: keda-operator source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name - action: keep regex: ${env:K8S_NODE_NAME} source_labels: - __meta_kubernetes_pod_node_name scheme: http scrape_interval: 1m scrape_timeout: 30s - enable_http2: true follow_redirects: true job_name: scrape-annotations kubernetes_sd_configs: - role: pod metrics_path: /metrics relabel_configs: - action: drop regex: (kube-system|nginx-ingress-internal|nginx-ingress-external) source_labels: - __meta_kubernetes_namespace - action: keep regex: ${env:K8S_NODE_NAME} source_labels: - __meta_kubernetes_pod_node_name - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: keep regex: ^([^:]+)(?::\d+)?;(\d+)$ replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_annotation_examplecompany_net_(.+) scheme: http scrape_interval: 1m scrape_timeout: 30s - enable_http2: true follow_redirects: true job_name: signalfx-scrape-annotations kubernetes_sd_configs: - role: pod metrics_path: /metrics relabel_configs: - action: drop regex: kube-system source_labels: - __meta_kubernetes_namespace - action: keep regex: ${env:K8S_NODE_NAME} source_labels: - __meta_kubernetes_pod_node_name - action: keep regex: prometheus-exporter source_labels: - __meta_kubernetes_pod_annotation_agent_signalfx_com_monitorType_http - action: keep regex: "80" source_labels: - __meta_kubernetes_pod_container_port_number - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_agent_signalfx_com_config_http_metricPath target_label: __metrics_path__ - action: labelmap regex: __meta_kubernetes_pod_annotation_examplecompany_net_(.+) scheme: http scrape_interval: 1m scrape_timeout: 30s prometheus/agent: config: scrape_configs: - job_name: metrics-agent scrape_interval: 1m static_configs: - targets: - ${env:MY_POD_IP}:8888 zipkin: endpoint: ${env:MY_POD_IP}:9411 service: extensions: - health_check - zpages pipelines: logs: exporters: - splunk_hec/platform_logs processors: - memory_limiter - k8sattributes - filter/logs - resource/logs - resource - resource/add_environment - resourcedetection - batch receivers: - otlp metrics: exporters: - otlphttp processors: - memory_limiter - k8sattributes - filter/metrics - resource/add_environment - resourcedetection - resource/chrono-instance-id - resource - transform/add_service_and_role - batch receivers: - hostmetrics - kubeletstats - otlp metrics/agent: exporters: - otlphttp processors: - memory_limiter - resource/add_agent_k8s - resourcedetection - resource - batch receivers: - prometheus/agent metrics/prometheus: exporters: - otlphttp processors: - memory_limiter - transform/sum_histograms - k8sattributes/prometheus - filter/metrics - resource/add_environment - resourcedetection - resource/chrono-instance-id - resource - transform/add_service_and_role - batch receivers: - prometheus traces: exporters: - otlphttp processors: - k8sattributes - resource/add_environment - resourcedetection - resource - batch receivers: - otlp - jaeger - zipkin telemetry: logs: encoding: json metrics: address: ${env:MY_POD_IP}:8888 ```
I noticed that other memory leak issues usually require the reporter to post a heap pprof. I added pprof to our lower environments. Please find a heap dump of the oldest pod so instrumented (12 days old), unfortunately, it's not churning garbage collection yet though it's getting close.
Unfortunately, I'm running out of time to look at this issue, and I don't have much go experience to understand what I'm looking at in the heap dump. To work around, we have implemented an automatic restart on Mondays, hoping you can help.
Thank you so very much!
pprof.otelcol-contrib.samples.cpu.003.pb.gz pprof.otelcol-contrib.alloc_objects.alloc_space.inuse_objects.inuse_space.012.pb.gz
Expected Result
Garbage collection fully reclaims memory from routine operations
Actual Result
Garbage collection doesn't seem to affect some part of overall memory consumption.
Collector version
v0.92.0
Environment information
Environment
OS: GKE / ContainerOS Compiler(if manually compiled): using public docker image
OpenTelemetry Collector configuration
Log output
Additional context
Although the
metrics-agent
is configured to receive logs, metrics, and traces over OTLP, it does not do so in practice at this time. None of our services emit otlp metrics to the metrics-agent, only to the gateway deployment, which does not have this issue. On the metrics agent, the ports aren't even exposed. It collects metric signals using hostmetrics, kubeletstats, and prometheus only.