twitter / rezolus

Systems performance telemetry
Apache License 2.0
1.56k stars 116 forks source link

User space function metrics through bpf #221

Closed kylebartush-twitter closed 3 years ago

kylebartush-twitter commented 3 years ago

Problem

It is difficult to instrument some of the services written in c that my team manages. We can't easily export metrics and are stuck parsing logs to expose metrics. This causes lots of issues when it comes to logging throttling by syslog and log rotation.

Solution

I've added the ability to configure the probing of user space libraries through bpf. This will allow us to export metrics for the number of times a specific function is called for a particular shared library.

Result

I've added a sampler called usercall, that can be configured to probe user space libraries and export metrics. This can be done through the configuration file, either through a more friendly search heuristic or through specifying the exact file and function you would like to be probed.

CLAassistant commented 3 years ago

CLA assistant check
All committers have signed the CLA.

brayniac commented 3 years ago

Overall this PR looks pretty good. Have some minor changes that are needed before merge. Biggest thing I'm left wondering about is the sampler naming and if these stats should be namespaced or not.

Thanks for submitting the PR, this is a cool use-case for Rezolus and I'm excited to get this merged.

brayniac commented 3 years ago

Is it possible to improve the error message when the lib path is incorrect?

2021-06-08 19:00:19.097 INFO  [rezolus] Registering probes: [("/usr/lib/x86_64-linux-gnu/libkrb5.so.26", "krb26", "krb5_cc_get_principal"), ("/usr/lib/x86_64-linux-gnu/libkrb5.so.26", "krb26", "krb5_parse_name_flags"), ("/usr/lib/x86_64-linux-gnu/libkrb5.so.3", "krb3", "krb5_cc_get_principal"), ("/usr/lib/x86_64-linux-gnu/libkrb5.so.3", "krb3", "krb5_parse_name_flags"), ("/lib64/libcurl.so.4", "curl", "curl_global_init")]
2021-06-08 19:00:22.108 ERROR [rezolus] failed to initialize usercall sampler

This was not super helpful for figuring out which of the probes failed to attach. Would be nice if we could log an error with the library name and that the path is invalid - on this test system, krb libs were under /usr/lib64 - but it wasn't obvious why it failed to initialize until i checked for the libraries manually.

kylebartush-twitter commented 3 years ago

I've added more detailed errors and made it so a single bad probe doesn't disable the sampler if fault_tolerant is set.