grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.49k stars 224 forks source link

Configurable cgroup ID regex for process discovery component #1461

Closed mahendrapaipuri closed 8 hours ago

mahendrapaipuri commented 3 months ago

Request

Currently, process discovery component attempts to get the container ID by looking into the cgroup of the process. This can be further generalized by adding a new argument that takes a regex as input and attempts to find the cgroup ID by matching it against the cgroup of the process. Consequently we can add a new label, say, __meta_cgroup_id__, that will be added to the targets.

Use case

Most of the resource managers use cgroups to manage the resource allocated to compute workloads. In our particular case, it is SLURM (HPC batch scheduler). By using a configurable cgroup ID regex to process discovery component, we can find the job IDs of each process. And by using relabel magic, we can filter the processes that do not belong to any user jobs and eventually use the job ID as service_name to Pyroscope eBPF component. This will allow us to do continous profiling of user jobs and aggregate the profiles of each job on Grafana based on service_name (which is essentially job ID).

This should work for any resource manager which manages the cgroups in a deterministic way. For instance, one more use case can be to use with Openstack where libvirt manages the cgroups. Grafana alloy can be deployed directly on the hypervisor that will do the continous profiling of the VMs.

If the maintainers find a value in this feature, I would be happy to submit a PR.

simonswine commented 3 months ago

I would consider that a valid use case and a valuable addition to the component. Also do make sure to mention me in the PR so I won't miss it

mahendrapaipuri commented 3 months ago

Awesome. Sure, I will mention you in the PR once I have it ready!!

github-actions[bot] commented 2 months ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!