netscaler / netscaler-adc-metrics-exporter

Export metrics from Citrix ADC (NetScaler) to Prometheus
89 stars 33 forks source link

Sporadic metrics not pulling through #52

Open Zak-HS opened 5 months ago

Zak-HS commented 5 months ago

Describe the bug When checking the metrics being exported by the pod, it will on occasion not pull through all the metrics with probe_success being 0.0

The host itself is fine and as soon as you refresh, all the metrics gets pulled through.

This also happens when I curl localhost on the pod itself. This causes alerts on our monitoring systems when the node itself is fine.

Example output when the metrics don't come through.

HELP python_gc_objects_collected_total Objects collected during gc TYPE python_gc_objects_collected_total counter python_gc_objects_collected_total{generation="0"} 435.0 python_gc_objects_collected_total{generation="1"} 12.0 python_gc_objects_collected_total{generation="2"} 0.0 HELP python_gc_objects_uncollectable_total Uncollectable object found during GC TYPE python_gc_objects_uncollectable_total counter python_gc_objects_uncollectable_total{generation="0"} 0.0 python_gc_objects_uncollectable_total{generation="1"} 0.0 python_gc_objects_uncollectable_total{generation="2"} 0.0 HELP python_gc_collections_total Number of times this generation was collected TYPE python_gc_collections_total counter python_gc_collections_total{generation="0"} 64.0 python_gc_collections_total{generation="1"} 5.0 python_gc_collections_total{generation="2"} 0.0 HELP python_info Python platform information TYPE python_info gauge python_info{implementation="CPython",major="3",minor="8",patchlevel="10",version="3.8.10"} 1.0 HELP process_virtual_memory_bytes Virtual memory size in bytes. TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 3.264512e+07 HELP process_resident_memory_bytes Resident memory size in bytes. TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 2.6468352e+07 HELP process_start_time_seconds Start time of the process since unix epoch in seconds. TYPE process_start_time_seconds gauge process_start_time_seconds 1.71146154557e+09 HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. TYPE process_cpu_seconds_total counter process_cpu_seconds_total 802.16 HELP process_open_fds Number of open file descriptors. TYPE process_open_fds gauge process_open_fds 9.0 HELP process_max_fds Maximum number of open file descriptors. TYPE process_max_fds gauge process_max_fds 1.048576e+06 HELP citrixadc_probe_success probe_success TYPE citrixadc_probe_success gauge citrixadc_probe_success{nsip="pl2-ns-dmz2"} 0.0"

To Reproduce Steps to reproduce the behavior:

  1. Steps - curl localhost:8888 on the pod multiple times until you notice the metrics not being pulled through.
  2. Version of the metrics exporter - 1.4.9
  3. Version of the Citrix ADC MPX/VPX/CPX - NS13.0 92.21.nc
  4. Logs from the metrics exporter

Expected behavior All metrics pulled through all the time.

Additional context Add any other context about the problem here.

ankits123 commented 5 months ago

Thanks a lot @Zak-HS . We will review this and get back. Also please have a look into our direct metric export solution from ADC to Prometheus. https://docs.netscaler.com/en-us/citrix-adc/current-release/observability/prometheus-integration.html#:~:text=NetScaler%20now%20supports%20directly%20exporting,to%20know%20the%20NetScaler%20health.