microsoft / Docker-Provider

Azure Monitor for Containers
Other
140 stars 106 forks source link

Prometheus logs not showing up in Log Analytics Workspace #850

Closed VincentVerweij closed 1 year ago

VincentVerweij commented 1 year ago

My pods has the following annotations for a few weeks:

I have deployed a ConfigMap with the following settings, also a few weeks ago:

  prometheus-data-collection-settings: |-
    [prometheus_data_collection_settings.cluster]
        interval = "1m"
        fieldpass = ["mendix_concurrent_user_sessions", "mendix_connection_bus", "mendix_current_request_duration_seconds_bucket", "mendix_current_request_duration_seconds_count", "mendix_current_request_duration_seconds_sum", "mendix_jvm_memory_bytes", "mendix_jvm_memory_pool_bytes", "mendix_license_count", "mendix_named_users", "mendix_runtime_requests_total", "mendix_threadpool_handling_external_requests"]
        monitor_kubernetes_pods = true
        monitor_kubernetes_pods_namespaces = ["dev-apps"]

    [prometheus_data_collection_settings.node]
        interval = "1m"

This Microsoft Learn article tells me where I need to look for querying Prometheus log, which in turn points me to this article.

When I write my query in the Log Analytics Workspace e.g.:

InsightsMetrics
| where Namespace contains "prometheus"
| summarize by Name

I only get the following results

So, where are the other metrics that can be seen in my ConfigMap's fieldpass property? Am I missing something?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

VincentVerweij commented 1 year ago

Still waiting for an answer 😄

ganga1980 commented 1 year ago

Hi, @VincentVerweij , Did you verify with curl, the pods in dev-apps namespace emitting the metrics on port:8900? If not, can you please check it and see if this issue with app (likely).

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

VincentVerweij commented 1 year ago

Hi @ganga1980, I did check the pod's output for the metrics as suggested.

  1. I started a debugging pod on one of my cluster's nodes.
  2. From the debugging pod I first pinged my container with IP 10.33.0.45 and that gave me responses
  3. Then I executed this command: curl -g 'http://10.33.0.45:8900/metrics'

The output shows me the following:

# HELP mendix_concurrent_user_sessions Concurrent active users
# TYPE mendix_concurrent_user_sessions gauge
mendix_concurrent_user_sessions{app_id="my-app",pod_name="my-app-master-123456789-01234",session_type="anonymous"} 0
mendix_concurrent_user_sessions{app_id="my-app",pod_name="my-app-master-123456789-01234",session_type="named_users"} 0
# HELP mendix_connection_bus Connection bus count
# TYPE mendix_connection_bus gauge
mendix_connection_bus{app_id="my-app",connectionbus_type="delete",pod_name="my-app-master-123456789-01234"} 248
mendix_connection_bus{app_id="my-app",connectionbus_type="insert",pod_name="my-app-master-123456789-01234"} 274
mendix_connection_bus{app_id="my-app",connectionbus_type="select",pod_name="my-app-master-123456789-01234"} 55132
mendix_connection_bus{app_id="my-app",connectionbus_type="transaction",pod_name="my-app-master-123456789-01234"} 53466
mendix_connection_bus{app_id="my-app",connectionbus_type="update",pod_name="my-app-master-123456789-01234"} 129
# HELP mendix_current_request_duration_seconds Duration of the current runtime request
# TYPE mendix_current_request_duration_seconds histogram
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system",le="5e+06"} 5
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system",le="1e+07"} 5
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system",le="5e+07"} 5
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system",le="7.5e+07"} 5
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system",le="1.5e+08"} 5
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system",le="2.5e+08"} 5
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system",le="5e+08"} 5
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system",le="+Inf"} 5
mendix_current_request_duration_seconds_sum{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system"} 15
mendix_current_request_duration_seconds_count{app_id="my-app",client_type="custom",pod_name="my-app-master-123456789-01234",user_id="system"} 5
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system",le="5e+06"} 6
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system",le="1e+07"} 6
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system",le="5e+07"} 6
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system",le="7.5e+07"} 6
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system",le="1.5e+08"} 6
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system",le="2.5e+08"} 6
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system",le="5e+08"} 6
mendix_current_request_duration_seconds_bucket{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system",le="+Inf"} 6
mendix_current_request_duration_seconds_sum{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system"} 17
mendix_current_request_duration_seconds_count{app_id="my-app",client_type="scheduled_event",pod_name="my-app-master-123456789-01234",user_id="system"} 6
# HELP mendix_jvm_memory_bytes JVM memory
# TYPE mendix_jvm_memory_bytes gauge
mendix_jvm_memory_bytes{app_id="my-app",memory_type="committed_heap",pod_name="my-app-master-123456789-01234"} 2.45366784e+08
mendix_jvm_memory_bytes{app_id="my-app",memory_type="committed_nonheap",pod_name="my-app-master-123456789-01234"} 2.03886592e+08
mendix_jvm_memory_bytes{app_id="my-app",memory_type="init_heap",pod_name="my-app-master-123456789-01234"} 3.3554432e+07
mendix_jvm_memory_bytes{app_id="my-app",memory_type="init_nonheap",pod_name="my-app-master-123456789-01234"} 7.667712e+06
mendix_jvm_memory_bytes{app_id="my-app",memory_type="max_heap",pod_name="my-app-master-123456789-01234"} 5.24288e+08
mendix_jvm_memory_bytes{app_id="my-app",memory_type="max_nonheap",pod_name="my-app-master-123456789-01234"} -1
mendix_jvm_memory_bytes{app_id="my-app",memory_type="used_heap",pod_name="my-app-master-123456789-01234"} 1.78020024e+08
mendix_jvm_memory_bytes{app_id="my-app",memory_type="used_nonheap",pod_name="my-app-master-123456789-01234"} 1.90724688e+08
# HELP mendix_jvm_memory_pool_bytes JVM memory pools
# TYPE mendix_jvm_memory_pool_bytes gauge
mendix_jvm_memory_pool_bytes{app_id="my-app",is_heap="false",memory_pool="codeheap_non-nmethods",pod_name="my-app-master-123456789-01234"} 1.504128e+06
mendix_jvm_memory_pool_bytes{app_id="my-app",is_heap="false",memory_pool="codeheap_non-profiled_nmethods",pod_name="my-app-master-123456789-01234"} 1.3221504e+07
mendix_jvm_memory_pool_bytes{app_id="my-app",is_heap="false",memory_pool="codeheap_profiled_nmethods",pod_name="my-app-master-123456789-01234"} 3.3781632e+07
mendix_jvm_memory_pool_bytes{app_id="my-app",is_heap="false",memory_pool="compressed_class_space",pod_name="my-app-master-123456789-01234"} 1.6397568e+07
mendix_jvm_memory_pool_bytes{app_id="my-app",is_heap="false",memory_pool="metaspace",pod_name="my-app-master-123456789-01234"} 1.25819856e+08
mendix_jvm_memory_pool_bytes{app_id="my-app",is_heap="true",memory_pool="g1_eden_space",pod_name="my-app-master-123456789-01234"} 6.0817408e+07
mendix_jvm_memory_pool_bytes{app_id="my-app",is_heap="true",memory_pool="g1_old_gen",pod_name="my-app-master-123456789-01234"} 1.1615404e+08
mendix_jvm_memory_pool_bytes{app_id="my-app",is_heap="true",memory_pool="g1_survivor_space",pod_name="my-app-master-123456789-01234"} 1.048576e+06
# HELP mendix_license_count Mendix Runtime License count
# TYPE mendix_license_count gauge
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
mendix_license_count{app_id="my-app",license_id="obfuscated_license_id_here",license_runtime_mode="unknown",license_type="unknown",pod_name="my-app-master-123456789-01234"} 1
# HELP mendix_named_users Total named users
# TYPE mendix_named_users gauge
mendix_named_users{app_id="my-app",pod_name="my-app-master-123456789-01234"} 164
# HELP mendix_runtime_requests_total Total requests received in the runtime
# TYPE mendix_runtime_requests_total gauge
mendix_runtime_requests_total{app_id="my-app",handler="",pod_name="my-app-master-123456789-01234"} 27732
mendix_runtime_requests_total{app_id="my-app",handler="api-doc",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="debugger",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="file",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="manifest.webmanifest",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="p",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="rest",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="rest-doc",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="ws",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="ws-doc",pod_name="my-app-master-123456789-01234"} 0
mendix_runtime_requests_total{app_id="my-app",handler="xas",pod_name="my-app-master-123456789-01234"} 0
# HELP mendix_threadpool_handling_external_requests Jetty server thread pool statistics
# TYPE mendix_threadpool_handling_external_requests gauge
mendix_threadpool_handling_external_requests{app_id="my-app",pod_name="my-app-master-123456789-01234",thread_stats_type="active_threads"} 4
mendix_threadpool_handling_external_requests{app_id="my-app",pod_name="my-app-master-123456789-01234",thread_stats_type="max_threads"} 254
mendix_threadpool_handling_external_requests{app_id="my-app",pod_name="my-app-master-123456789-01234",thread_stats_type="min_threads"} 8
mendix_threadpool_handling_external_requests{app_id="my-app",pod_name="my-app-master-123456789-01234",thread_stats_type="thread_pool_size"} 8

It seems the app is all fine, so any suggestions left?

ganga1980 commented 1 year ago

Hi, @VincentVerweij , can you please confirm these pods running inside the namespace - "dev-apps" and also has the pod annotations? can you please share the output so that I can check if everything looks good?

vdiec commented 1 year ago

Hi @VincentVerweij, did you face any issues with the documentation? Please let me know so we can update it

VincentVerweij commented 1 year ago

@ganga1980 It is most certainly running in that namespace, otherwise I would not have configured the namespace in the ConfigMap. It also has these annotations, as was documented and noted by you that I should have these on the pod.

Here is a kubectl describe of the pod that returned the metrics that I queried with cURL in my previous comment:

Name:         my-app-master-123456789-01234
Namespace:    dev-apps
Priority:     0
Node:         aks-devapps-12100402-vmss00001q/10.33.0.53
Start Time:   Mon, 28 Nov 2022 00:05:48 +0000
Labels:       pod-template-hash=9756dd647
              privatecloud.mendix.com/app=my-app
              privatecloud.mendix.com/component=mendix-app
              privatecloud.mendix.com/node-type=master
Annotations:  buildGeneration: 32
              configHash: <obfuscated_hash>
              prometheus.io/path: /metrics
              prometheus.io/port: 8900
              prometheus.io/scrape: true
Status:       Running
IP:           10.33.0.45
IPs:
  IP:           10.33.0.45
Controlled By:  ReplicaSet/my-app-master-123456789
Containers:
  mendix:
    Container ID:   containerd://<obfuscated_hash>
    Image:          obfuscated-customer-acr.azurecr.io/obfuscated-customer-acr:my-app
    Image ID:       obfuscated-customer-acr.azurecr.io/obfuscated-customer-acr@sha256:<obfuscated_hash>
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 30 Nov 2022 06:39:06 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 30 Nov 2022 03:35:59 +0000
      Finished:     Wed, 30 Nov 2022 06:39:04 +0000
    Ready:          True
    Restart Count:  20
    Limits:
      cpu:     2
      memory:  2000Mi
    Requests:
      cpu:      1
      memory:   1000Mi
    Liveness:   http-get http://:mendix-app/ delay=60s timeout=1s period=15s #success=1 #failure=3
    Readiness:  http-get http://:mendix-app/ delay=5s timeout=1s period=1s #success=1 #failure=3
    Environment:
      M2EE_ADMIN_LISTEN_ADDRESSES:  127.0.0.1
      M2EE_ADMIN_PORT:              9000
      M2EE_ADMIN_PASS:              <set to the key 'adminpassword' in secret 'my-app-m2ee'>  Optional: false
    Mounts:                         <none>
  m2ee-sidecar:
    Container ID:   containerd://<obfuscated_hash>
    Image:          private-cloud.registry.mendix.com/mx-m2ee-sidecar:2.1.0
    Image ID:       private-cloud.registry.mendix.com/mx-m2ee-sidecar@sha256:<obfuscated_hash>
    Port:           8800/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 28 Nov 2022 00:07:36 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  32Mi
    Requests:
      cpu:     100m
      memory:  16Mi
    Environment:
      MX_NODE_TYPE:  master
    Mounts:
      /opt/mendix/config from runtime-config (ro)
      /opt/mendix/m2ee from m2ee-config (ro)
      /opt/mendix/services from services (ro)
  m2ee-metrics:
    Container ID:  containerd://<obfuscated_hash>
    Image:         private-cloud.registry.mendix.com/mx-m2ee-metrics:2.1.0
    Image ID:      private-cloud.registry.mendix.com/mx-m2ee-metrics@sha256:<obfuscated_hash>
    Port:          8900/TCP
    Host Port:     0/TCP
    Args:
      -port=8900
    State:          Running
      Started:      Mon, 28 Nov 2022 00:07:39 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  32Mi
    Requests:
      cpu:     100m
      memory:  16Mi
    Environment:
      MX_APP_ID:  my-app
    Mounts:
      /opt/mendix/m2ee from m2ee-config (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  m2ee-config:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          my-app-m2ee
    SecretOptionalName:  <nil>
  runtime-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-app-runtime-config
    Optional:  false
  services:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          my-app-database
    SecretOptionalName:  <nil>
    SecretName:          my-app-file
    SecretOptionalName:  <nil>
QoS Class:               Burstable
Node-Selectors:          <none>
Tolerations:             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                         node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                         node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                  <none>

@vdiec currently no issues with the documentation, just that I don't see anything of Prometheus in my log analytics workspace. When something is unclear, I will notify you about it 😄

ganga1980 commented 1 year ago

@ganga1980 It is most certainly running in that namespace, otherwise I would not have configured the namespace in the ConfigMap. It also has these annotations, as was documented and noted by you that I should have these on the pod.

Here is a kubectl describe of the pod that returned the metrics that I queried with cURL in my previous comment:

Name:         my-app-master-123456789-01234
Namespace:    dev-apps
Priority:     0
Node:         aks-devapps-12100402-vmss00001q/10.33.0.53
Start Time:   Mon, 28 Nov 2022 00:05:48 +0000
Labels:       pod-template-hash=9756dd647
              privatecloud.mendix.com/app=my-app
              privatecloud.mendix.com/component=mendix-app
              privatecloud.mendix.com/node-type=master
Annotations:  buildGeneration: 32
              configHash: <obfuscated_hash>
              prometheus.io/path: /metrics
              prometheus.io/port: 8900
              prometheus.io/scrape: true
Status:       Running
IP:           10.33.0.45
IPs:
  IP:           10.33.0.45
Controlled By:  ReplicaSet/my-app-master-123456789
Containers:
  mendix:
    Container ID:   containerd://<obfuscated_hash>
    Image:          obfuscated-customer-acr.azurecr.io/obfuscated-customer-acr:my-app
    Image ID:       obfuscated-customer-acr.azurecr.io/obfuscated-customer-acr@sha256:<obfuscated_hash>
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 30 Nov 2022 06:39:06 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 30 Nov 2022 03:35:59 +0000
      Finished:     Wed, 30 Nov 2022 06:39:04 +0000
    Ready:          True
    Restart Count:  20
    Limits:
      cpu:     2
      memory:  2000Mi
    Requests:
      cpu:      1
      memory:   1000Mi
    Liveness:   http-get http://:mendix-app/ delay=60s timeout=1s period=15s #success=1 #failure=3
    Readiness:  http-get http://:mendix-app/ delay=5s timeout=1s period=1s #success=1 #failure=3
    Environment:
      M2EE_ADMIN_LISTEN_ADDRESSES:  127.0.0.1
      M2EE_ADMIN_PORT:              9000
      M2EE_ADMIN_PASS:              <set to the key 'adminpassword' in secret 'my-app-m2ee'>  Optional: false
    Mounts:                         <none>
  m2ee-sidecar:
    Container ID:   containerd://<obfuscated_hash>
    Image:          private-cloud.registry.mendix.com/mx-m2ee-sidecar:2.1.0
    Image ID:       private-cloud.registry.mendix.com/mx-m2ee-sidecar@sha256:<obfuscated_hash>
    Port:           8800/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 28 Nov 2022 00:07:36 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  32Mi
    Requests:
      cpu:     100m
      memory:  16Mi
    Environment:
      MX_NODE_TYPE:  master
    Mounts:
      /opt/mendix/config from runtime-config (ro)
      /opt/mendix/m2ee from m2ee-config (ro)
      /opt/mendix/services from services (ro)
  m2ee-metrics:
    Container ID:  containerd://<obfuscated_hash>
    Image:         private-cloud.registry.mendix.com/mx-m2ee-metrics:2.1.0
    Image ID:      private-cloud.registry.mendix.com/mx-m2ee-metrics@sha256:<obfuscated_hash>
    Port:          8900/TCP
    Host Port:     0/TCP
    Args:
      -port=8900
    State:          Running
      Started:      Mon, 28 Nov 2022 00:07:39 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  32Mi
    Requests:
      cpu:     100m
      memory:  16Mi
    Environment:
      MX_APP_ID:  my-app
    Mounts:
      /opt/mendix/m2ee from m2ee-config (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  m2ee-config:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          my-app-m2ee
    SecretOptionalName:  <nil>
  runtime-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-app-runtime-config
    Optional:  false
  services:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          my-app-database
    SecretOptionalName:  <nil>
    SecretName:          my-app-file
    SecretOptionalName:  <nil>
QoS Class:               Burstable
Node-Selectors:          <none>
Tolerations:             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                         node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                         node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                  <none>

@vdiec currently no issues with the documentation, just that I don't see anything of Prometheus in my log analytics workspace. When something is unclear, I will notify you about it 😄 @VincentVerweij , Thanks for sharing this info. This looks good. can you please check the telegraf process running in all the pods (in both ama-logs & ama-logs-prometheus container) through "ps aux" command ? if its test cluster, can you please share the AKS cluster name or AKS cluster resource id to investigate what cause Prometheus metrics not getting ingested? or can you please create support ticket with required cluster details to get this further investigated and provide the resolution to this issue?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 12 days with no activity.

VincentVerweij commented 1 year ago

Hi @ganga1980,

I opened up each pod's containers by running the following commands on my cluster: kubectl exec --stdin --tty ama-logs-cmj74 -n kube-system --container ama-logs -- /bin/bash kubectl exec --stdin --tty ama-logs-cmj74 -n kube-system --container ama-logs-prometheus -- /bin/bash

Once inside the containers I executed the following command: ps aux | grep telegraf.

The output for each node's pod was the following: image

I am not allowed to share the name of the cluster here due to customer's policy. So, what else do you propose that I need to verify?