javierhonduco / rbperf

Low-overhead sampling profiler and tracer for Ruby for Linux
MIT License
119 stars 5 forks source link

Improve frame storage #74

Closed javierhonduco closed 1 year ago

javierhonduco commented 1 year ago

This commit changes a few things, first, it gets rid of the random IDs used to identify frames, as the birthday paradox, and production, shows that we'll quickly find duplicates using 32 bits. This now uses a global array. In BPF arrays are zero initialised, so we don't need to do that explicitly. As we are using shared memory across multiple threads, we read + increment the IDs atomically.

In addition to these changes that should reduce the chances of collisions and hence of lost frames, also bump the number of frames we can store and reduce the number of processes we can profile at once, as it's a very high number. We can tune these later on.

Notes

Other approaches that I've checked for ID generation are to use a per CPU counter and then offset the CPU number. This works but care has to be taken to make this work in systems with an "unbounded" number of CPUs. Not using an atomic get+increment is cheaper, but we would have to call the CPU id helper, so perf-wise it's probably in the same ballpark.

Another avenue we could explore is to have a generic symbol table to store both method names and paths. This would result in a reduction of stored strings as we wouldn't need to have the combination of path names x directories, but we would waste more space as these two string buffers have different sizes.

This commit doesn't fully fix the issue of collisions. A better fix will come later by processing the samples in a streaming fashion.

Signed-off-by: Francisco Javier Honduvilla Coto javierhonduco@gmail.com

javierhonduco commented 1 year ago

@manuelfelipe This should help with https://github.com/javierhonduco/rbperf/issues/70, let me know if that's not the case 😄