Infrastructure for bpf_tail_call overhead is high for cases where there are no bpf_tail_calls

Alan-Jowett commented 3 years ago

Profiling of the droppacket.o program shows this:

What is interesting is that the call to _ebpf_state_get_entry is costing around ~900 cycles, even for BPF programs that have no tail calls.

See https://github.com/microsoft/ebpf-for-windows/pull/452 for the test.

dthaler commented 3 years ago

Can we investigate a design whereby the state overhead only occurs inside the bpf_tail_call call?

Alan-Jowett commented 3 years ago

Thanks to @dthaler for the following proposal:

Inside the bpf_tail_call helper, write an entry to a hash table keyed by thread id and bump a global epoch-like counter. Make invoke NOT do any such state store prior to the call, and after the call, check the global epoch counter to see whether it should spend the time doing a lookup by thread id to get the state stored there.

Other alternatives to investigate: Instead of a global tail call epoch, make this a per instance trip-wire in ebpf_program_t that is set the first time ebpf_program_set_tail_call is called, then have ebpf_program_invoke only for tail call state if trip-wire is hit.

microsoft / ebpf-for-windows

Infrastructure for bpf_tail_call overhead is high for cases where there are no bpf_tail_calls #453