oracle / bpftune

bpftune uses BPF to auto-tune Linux systems
Other
654 stars 55 forks source link

bpftune not adjusting net.ipv4.tcp_rmem on Artix Linux with Dinit, CachyOS kernel and BBRv3 #76

Closed sm9cc closed 10 months ago

sm9cc commented 10 months ago

Description:

I am encountering an issue while running bpftune on Artix Linux with the CachyOS kernel and the BBRv3 congestion control algorithm. My expectation is that bpftune should automatically correct the net.ipv4.tcp_rmem value, but it is not doing so.

Environment:

Steps to Reproduce:

  1. Run sudo bpftune -ds
  2. Execute the following commands in a new terminal:
    
    cd ~/
    sudo sysctl -w net.ipv4.tcp_rmem="4096 131072 1310720"
    wget https://yum.oracle.com/ISOS/OracleLinux/OL8/u7/x86_64/OracleLinux-R8-U7-x86_64-dvd.iso

Expected Behavior:

bpftune should automatically adjust the net.ipv4.tcp_rmem value.

Actual Behavior:

The value remains "4096 131072 1310720"

Debug Output:

https://paste.cachyos.org/p/2ec25a0.log

Thank you in advance for your assistance.

alan-maguire commented 10 months ago

thanks for the report! looking at the log the relevant snippet is here:

bpftune: libbpf: prog 'entrytcp_sndbuf_expand': failed to find kernel BTF type ID of 'tcp_sndbuf_expand': -3 bpftune: libbpf: prog 'entrytcp_sndbuf_expand': failed to prepare load attributes: -3 bpftune: libbpf: prog 'entry__tcp_sndbuf_expand': failed to load: -3 bpftune: libbpf: failed to load object 'tcp_buffer_tuner_bpf' bpftune: libbpf: failed to load BPF skeleton 'tcp_buffer_tuner_bpf': -3 bpftune: could not load skeleton: No such process

I suspect one of two things happened:

  1. tcp_sndbuf_expand got inlined
  2. tcp_sndbuf_expand had gcc optimizations applied, resulting in a ".isra.0" or similar suffix to check can you do the following

sudo grep tcp_sndbuf_expand /proc/kallsyms ...and report the result? thanks!

if 1 above happened, there's not a lot we can do aside from make failure to attach non-fatal for the tuner as a whole. if 2 above happened, we can probably add some additional logic to handle the "."-suffixed case to support finding and attaching to the function.

Anyway if you get a chance to provide the above output we should be able to resolve the issue one way or another. thanks!

sm9cc commented 10 months ago

Hi, running sudo grep tcp_sndbuf_expand /proc/kallsyms does not give me any output.

I suspect one of two things happened:

1. tcp_sndbuf_expand got inlined

2. tcp_sndbuf_expand had gcc optimizations applied, resulting in a ".isra.0" or similar suffix
   to check can you do the following

I acquired bpftune-git from the CachyOS repository. It's quite likely that the issue stems from their optimized build process, which seems to align with one of your two hypotheses. This might impact CachyOS users as well, although I do not recall having this issue last time I tried CachyOS, which is why I thought it may have been related to my somewhat exotic setup. Anyway, I've already contacted them for clarification and am currently awaiting a response.


Edit: It looks like a minor adjustment to the CFLAGS to make the march more generic was made, which is intended for repository packaging. https://github.com/CachyOS/CachyOS-PKGBUILDS/blob/master/bpftune-git/fix-makefile.patch

Additionally, https://github.com/ptr1337 has mentioned that they didn't notice the issue on CachyOS, and the output on their server is as follows:

cat /proc/kallsyms | grep tcp_sndbuf
0000000000000000 t __pfx_tcp_sndbuf_expand
0000000000000000 t tcp_sndbuf_expand
0000000000000000 t bpf_prog_3e77fcd1a5c4fe3a_entry__tcp_sndbuf_expand    [bpf]

Context - https://i.imgur.com/b3iT5bA.png

Thanks for your assistance.

alan-maguire commented 10 months ago

thanks for following up on this! we can also make tcp_sndbuf_expand attachment optional - in your case that should (unless the rcvbuf-related functions have vanished too!) rescue receive buffer and tcp mem auto-tuning at least, since overall attach will no longer fail. i'll push a change shortly once i've tested it.

alan-maguire commented 10 months ago

just merged #77

sm9cc commented 10 months ago

Thanks! That seems to have fixed the issue.

Debug output - https://paste.cachyos.org/p/d0c5020.log

alan-maguire commented 10 months ago

that's great; thanks for testing it so quickly!

0xAlcibiades commented 6 months ago

Seeing the same issue on the latest.