oracle / bpftune

bpftune uses BPF to auto-tune Linux systems
Other
685 stars 59 forks source link

[Feature] Dry Run Mode #19

Closed j0sh3rs closed 1 year ago

j0sh3rs commented 1 year ago

Hello!

Given the responsiveness to the Debian related issue in #14 , I'm hoping this project is still under solid development and open to new ideas! I've been very excited to try it out and gather data about what's happening in my local lab, and eventually into various CSPs for analysis!

One thing I think would be extremely beneficial is to provide a dry-run mode where the systemd service (or foreground process) would provide recommendations without affecting any change.

Things like the tcp congestion algo and window sizes are fairly substantial to larger organizations who might benefit from, but need to evaluate closely, the changes from bpftune. One way to provide visibility into what can be tuned is to simply log the suggestions without taking further action. Today, this doesn't seem like an optional behavior.

Tangentially related to this notion would also be a flag to define the logging location -- the -s flag can be reserved for syslog, but an optional path to a logfile would allow for these recommendations to be written and not potentially lost in the sea of syslog events.

I don't have any experience with C programming, so this would take me some time to try to implement myself. If it's easy enough for someone here to knock out, and they're willing, that'd be fantastic. Failing that, I'd take a stab at it myself, but it would take some time.

Thanks for this project! I'm excited to watch it grow with new recommendations for optimizations.

alan-maguire commented 1 year ago

thanks for the suggestions, and I'd be delighted if you can contribute in any way! Contributions do require signing the Oracle Contributor Agreement (https://oca.opensource.oracle.com/) but that's a one-time thing and pretty automated these days. In terms of dry-run mode, that's a great idea. One complication it has with respect to the bpftune model is that bpftune heaviliy interleaves changes and further observation; it doesn't compute an optimal value so much as find its way to one if that makes sense, and the tunable changes also feed into the evaluation in some cases (for example with tcp buffer tuning we correlate tunable changes with latency to see if we're introducing latencies by increasing buffer size, and then we back off). So having a dry-run mode would require a bit of a rethink in how we respond to events that drive tunable updates, but it would be a really nice feature to have.

With respect to logging, there's actually a pluggable infrastructure in libbpftune so implementing "log to a file" rather than "log to syslog" or "log to stdout" would be an easier feature to start with I suspect.

And with respect to optimization reccommendations, it's my hope folks will have some ideas there too! The initial set were issues we hit in oracle around networking, but the tuner infrastructure is pretty general so I'm hopeful we can tackle some new problems too.

alan-maguire commented 1 year ago

one further note on dry-run mode; a direction we could perhaps explore that would be compatible with the bpftune mode of operation (whereby small changes are made frequently) is to support a "rollback" mode of operation, where on exit we summarize the changes that were made, but roll them back to the pre-bpftune state. so specifying bpftune -R (rollback mode) would undo the changes made, but emit the overall changes as suggested updates. Would that sort of approach work? thanks!

alan-maguire commented 1 year ago

I merged rollback support; see

https://github.com/oracle/bpftune/pull/60

...for more details.

alan-maguire commented 1 year ago

closing this as rollback support is the closest we can feasibly get to dry-run mode. to recap, with rollback we can run bpftune and see what changes are made, and the changes are rolled back on exit. this mode of operation is compatible with the bpftune philosophy of making small changes (frequently, if needed) and observing effects. since finding the right numbers and assessing their effectiveness are tightly intertwined there's no way to get there without actually changing tunable values, but at least with rolback the changes are undone.