Closed yoderme closed 2 months ago
Hi @yoderme. Thanks for your PR.
After inspecting your changes someone with write access to this repo needs to approve and run the workflow.
Since I can't create a github issue I'll put all my info and rationale here.
I have enabled the agent prometheus exporter as per the instructions in https://docs.sysdig.com/en/docs/installation/configuration/sysdig-agent/agent-health/#collect-agent-health-metrics. This meant setting
agent:
sysdig:
settings:
prometheus_exporter:
enabled: true
export_health_metrics: true
I know for a fact that port 9544 is open and exposing metrics:
❯ kubectl -n sysdig exec sysdig-agent-5dz5q -- curl -s http://localhost:9544/metrics | grep ^sysdig ⎈ c048-rks
sysdig_agent_analyzer_dropped_evts 0
sysdig_agent_analyzer_num_evts 8673.2
sysdig_agent_connected 1
…
Our k8s cluster uses the prometheus operator to collect metrics. https://github.com/prometheus-operator/prometheus-operator . The way one would use this to scrape metrics is to create a ServiceMonitor or PodMonitor object - the prometheus operator will see those and magically start scraping metrics. Since a ServiceMonitor will look at a service, and a PodMonitor looks at a pod, and since there’s no k8s Service for this port, I’m trying to use PodMonitor. Check out the docs: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.PodMonitor
Note especially the PodMetricsEndpoint https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.PodMetricsEndpoint. The “port” description is “Name of the Pod port which this endpoint refers to.” This requires an explicit name. It turns out that “targetPort” whose description is “Name or number of the target port of the Pod object behind the Service, the port must be specified with container port property. Deprecated: use ‘port’ instead.” Also wants the port to be explicitly named.
But I tried anyway using targetPort:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: sysdig-agent
namespace: sysdig
spec:
selector:
matchLabels:
app: sysdig-agent
namespaceSelector:
matchNames:
- sysdig
podMetricsEndpoints:
- targetPort: 9544
path: /metrics
interval: 30s
When I look in the prometheus pod’s UI I see “podMonitor/sysdig/sysdig-agent/0 (0 / 73 active targets)” so I know for a fact that prometheus sees the thing and is failing to scrape it.
So … taking matters into my own hands, I edited the agent daemonset to add
> ports:
> - containerPort: 9544
> name: metrics
To the sysdig-agent container. Then I created
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: sysdig-agent
namespace: sysdig
spec:
selector:
matchLabels:
app: sysdig-agent
namespaceSelector:
matchNames:
- sysdig
podMetricsEndpoints:
- port: metrics
path: /metrics
interval: 30s
Once this was done, data started to flow into prometheus and then into our Grafana UI. Success!
Hello, @yoderme! Thank you for the PR and very thorough write-up! We're taking a look at this now. 🙂
@yoderme we identified an issue related to GHA and access to secrets from external contributors. In order to speed up the resolution of this, i'm going to open a PR myself to include these changes.
@yoderme we identified an issue related to GHA and access to secrets from external contributors. In order to speed up the resolution of this, i'm going to open a PR myself to include these changes.
Hey that's great - thanks for looking at this so quickly. If you don't mind, I'd love to get a notification when this is released and available for me to officially use. Thanks!
@yoderme Sure. I did already open the PR and it's going through the GHA checks. Once the pipeline complete we'll merge and immediately release a new version of the chart
@yoderme should be fixed in https://github.com/sysdiglabs/charts/pull/1869 (chart available soon), thx for your contribution!
@yoderme The fix is now available in sysdig-deploy
version 1.61.5
.
Thanks again for your contribution!