twitter / rezolus

Systems performance telemetry
Apache License 2.0
1.56k stars 116 forks source link

bpf: improve handling of partial failures #259

Closed brayniac closed 2 years ago

brayniac commented 2 years ago

As our BPF sampling becomes more extensive, we can run into cases where we have a partial failure initializing the BPF probes for a given sampler. Currently, this will result in all BPF telemetry being disabled within that sampler.

This change makes it so that the probe attach uses the fault tolerance configuration to either swallow or raise the error.