Closed mahendrapaipuri closed 8 hours ago
I would consider that a valid use case and a valuable addition to the component. Also do make sure to mention me in the PR so I won't miss it
Awesome. Sure, I will mention you in the PR once I have it ready!!
This issue has not had any activity in the past 30 days, so the needs-attention
label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention
label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
Request
Currently, process discovery component attempts to get the container ID by looking into the cgroup of the process. This can be further generalized by adding a new argument that takes a regex as input and attempts to find the cgroup ID by matching it against the cgroup of the process. Consequently we can add a new label, say,
__meta_cgroup_id__
, that will be added to the targets.Use case
Most of the resource managers use cgroups to manage the resource allocated to compute workloads. In our particular case, it is SLURM (HPC batch scheduler). By using a configurable cgroup ID regex to process discovery component, we can find the job IDs of each process. And by using relabel magic, we can filter the processes that do not belong to any user jobs and eventually use the job ID as
service_name
to Pyroscope eBPF component. This will allow us to do continous profiling of user jobs and aggregate the profiles of each job on Grafana based onservice_name
(which is essentially job ID).This should work for any resource manager which manages the cgroups in a deterministic way. For instance, one more use case can be to use with Openstack where libvirt manages the cgroups. Grafana alloy can be deployed directly on the hypervisor that will do the continous profiling of the VMs.
If the maintainers find a value in this feature, I would be happy to submit a PR.