Closed andrey-admin closed 1 year ago
thanks for reportig - can you retry with the latest main branch? I ran into a segmentation fault on ubuntu and pushed a fix that resolved it. if it's still there, can you attach the stack associated with the core dump from gdb and i'll try and figure out what's going on.
@alan-maguire same issue but with that in dmesg:
[953685.243427] bpftune[85946]: segfault at 0 ip 00007f29a426fde2 sp 00007ffc218d6980 error 4 in tcp_cong_tuner.so[7f29a426f000+2000] [953685.243437] Code: 45 a8 0f b6 40 40 83 f0 01 84 c0 0f 84 97 00 00 00 e8 c3 f7 ff ff 48 89 45 c8 48 8b 45 a8 48 8b 55 c8 48 89 50 48 48 8b 45 c8 <48> 8b 10 48 8b 45 a8 48 89 50 38 e8 8e f4 ff ff 48 8b 55 c8 48 8b
thanks! i can't reproduce it so can you try running "gdb bpftune
thanks; the crash is happening in bpftune_bpf_init(); would you be able to run "bpftune -ds" to see if we can see what is happening with bpf open/load/attach?
i suspect the issue is https://lore.kernel.org/bpf/20211008000309.43274-7-andrii@kernel.org/ where the bpf skeleton generation does not like the .rodata.cst16 section . It may be that a newer bpftool might help; i'm using bpftool 5.15 on ubuntu from the linux-tools package synced to the kernel version. however we may also be able to work around this; you could try making the following chages to tcp_cong_tuner.bpf.c and rebuilding:
diff --git a/src/tcp_cong_tuner.bpf.c b/src/tcp_cong_tuner.bpf.c index 77957b3..ab6661a 100644 --- a/src/tcp_cong_tuner.bpf.c +++ b/src/tcp_cong_tuner.bpf.c @@ -40,7 +40,7 @@ static __always_inline bool retransmit_threshold(struct remote_host *remote_host, u32 segs_out, u32 total_retrans) {
static const char bbr[4] = "bbr"; __u64 now;
if (!remote_host)
@@ -188,7 +188,7 @@ int BPF_PROG(cong_retransmit, struct sock sk, struct sk_buff skb) struct tcp_sock tp = (struct tcp_sock )sk; struct in6_addr *key = &sin6->sin6_addr; __u32 segs_out = 0, total_retrans = 0;
that was enough to get rid of the .rodata.cst16 section (it's replaced with .rodata.str1.1 that bpftool can handle).
patch got mangled but replaces
const char bbr[CONG_MAXNAME] = "bbr";
...with
static const char bbr[4] = "bbr";
...in the two places it is declared in tcp_cong_tuner.bpf.c
static volatile const char const bbr[4] = {'b', 'b', 'r', '\0'};
B-)
can you put patch in attach, please?
still sigfault.
58 bpftuner_bpf_init(tcp_cong, tuner, NULL); (gdb) bt
last strings from -ds:
bpftune: libbpf: prog 'cong_retransmit': found data map 5 (tcp_cong.bss, sec 8, off 0) for insn 157 bpftune: libbpf: sec '.reltp_btf/tcp_retransmit_skb': relo #4: insn #179 against 'init_net' bpftune: libbpf: prog 'cong_retransmit': found extern #0 'init_net' (sym 34) for insn #179 bpftune: libbpf: sec '.reltp_btf/tcp_retransmit_skb': relo #5: insn #182 against 'bpftune_init_net' bpftune: libbpf: prog 'cong_retransmit': found data map 5 (tcp_cong.bss, sec 8, off 0) for insn 182 bpftune: libbpf: sec '.reltp_btf/tcp_retransmit_skb': relo #6: insn #190 against 'ring_buffer_map' bpftune: libbpf: prog 'cong_retransmit': found map 0 (ring_buffer_map, sec 9, off 0) for insn #190 bpftune: libbpf: sec '.reliter/tcp': collecting relocation for section(5) 'iter/tcp' bpftune: libbpf: sec '.reliter/tcp': relo #0: insn #37 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #37 bpftune: libbpf: sec '.reliter/tcp': relo #1: insn #52 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #52 bpftune: libbpf: sec '.reliter/tcp': relo #2: insn #57 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #57 bpftune: libbpf: sec '.reliter/tcp': relo #3: insn #135 against 'debug' bpftune: libbpf: prog 'bpftune_cong_iter': found data map 5 (tcp_cong.bss, sec 8, off 0) for insn 135 bpftune: libbpf: sec '.reliter/tcp': relo #4: insn #139 against '.rodata' bpftune: libbpf: prog 'bpftune_cong_iter': found data map 4 (tcp_cong.rodata, sec 11, off 0) for insn 139 bpftune: libbpf: failed to find skeleton map '.rodata.str1.1' Segmentation fault (core dumped)
can you check bpftool, clang versions ("bpftool --version", "clang --version"? ubuntu with bpftool v5.15 and clang v14 work fine for me, even with the .rodata.str1.1 sections.
root@nginx-01:/usr/src/bpf/bpftune# bpftool --version /usr/lib/linux-tools/5.19.0-1026-gcp/bpftool v7.0.0 using libbpf v1.0 features: libbpf_strict root@nginx-01:/usr/src/bpf/bpftune# clang --version Ubuntu clang version 14.0.0-1ubuntu1 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin
Machine - google cloud virtual server (n2d-highcpu-16)
thanks; above look fine and similar to my setup so I'm puzzled why we're seeing different things. regardless i think i've fixed one of the issues here; when bpftune opens/loads/attaches bpf it uses macros and these need to return failure status otherwise we try to load a program that failed to open, or attach a program that failed to load. i've merged that in pr https://github.com/oracle-samples/bpftune/pull/26 so hopefully that should resolve the segmentation fault, but i don't yet have a good solution for the bpf loading failure.
Just pulled repo, rebuild bpftune - no changes. Same Segmentation fault. Last strings from -ds:
bpftune: libbpf: prog 'cong_retransmit': found map 0 (ring_buffer_map, sec 9, off 0) for insn #190 bpftune: libbpf: sec '.reliter/tcp': collecting relocation for section(5) 'iter/tcp' bpftune: libbpf: sec '.reliter/tcp': relo #0: insn #37 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #37 bpftune: libbpf: sec '.reliter/tcp': relo #1: insn #52 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #52 bpftune: libbpf: sec '.reliter/tcp': relo #2: insn #57 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #57 bpftune: libbpf: sec '.reliter/tcp': relo #3: insn #135 against 'debug' bpftune: libbpf: prog 'bpftune_cong_iter': found data map 5 (tcp_cong.bss, sec 8, off 0) for insn 135 bpftune: libbpf: sec '.reliter/tcp': relo #4: insn #139 against '.rodata' bpftune: libbpf: prog 'bpftune_cong_iter': found data map 4 (tcp_cong.rodata, sec 11, off 0) for insn 139 bpftune: libbpf: failed to find skeleton map '.rodata.str1.1'
gdb:
Program terminated with signal SIGSEGV, Segmentation fault.
58 err = bpftuner_bpf_init(tcp_cong, tuner, NULL); (gdb) bt
# gdb --args `which bpftune` -s;
(gdb) break tcp_cong_tuner.c:58
(gdb) run
and next step, fin, step, ... commands;
root@nginx-01:/usr/src/bpf/bpftune# gdb --args which bpftune
-s;
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/bpftune... (gdb) run Starting program: /usr/sbin/bpftune -s [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". bpftune: bpftune works fully bpftune: bpftune supports per-netns policy (via netns cookie)
Program received signal SIGSEGV, Segmentation fault. 0x00007ffff784edec in init (tuner=0x555555575260) at tcp_cong_tuner.c:58 58 err = bpftuner_bpf_init(tcp_cong, tuner, NULL);
https://github.com/oracle-samples/bpftune/pull/27 may help here i think; if you get a chance, would you mind rebuilding/retesting. thanks!
Yeah, bpftune started ok, without any fault. Checking how it working.
Thanks!
great, thanks for taking the time to work through this! i'm hoping to still get to the bottom of why the str sections cause issues at your end too; that will result in the associated congestion tuner not loading.
Sorry, miss to check syslog after start.
Jul 11 09:41:09 nginx-01 bpftune[12736]: bpftune works fully Jul 11 09:41:09 nginx-01 bpftune[12736]: bpftune supports per-netns policy (via netns cookie) Jul 11 09:41:09 nginx-01 bpftune[12736]: tcp_cong open bpf: No such process Jul 11 09:41:09 nginx-01 bpftune[12736]: error initializing '/usr/lib64/bpftune//tcp_cong_tuner.so: No such process Jul 11 09:41:09 nginx-01 bpftune[12736]: could not open /proc/sys/net/ipv6/neigh/default/gc_interval (netns fd 0) for reading: No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: error reading tunable 'net.ipv6.neigh.default.gc_interval': No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: error initializing '/usr/lib64/bpftune//neigh_table_tuner.so: No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: could not open /proc/sys/net/ipv6/route/max_size (netns fd 0) for reading: No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: error reading tunable 'net.ipv6.route.max_size': No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: error initializing '/usr/lib64/bpftune//route_table_tuner.so: No such file or directory
But all files on place: root@nginx-01:/usr/src/bpf/bpftune# ls -ld /usr/lib64/bpftune//tcp_cong_tuner.so /usr/lib64/bpftune//neigh_table_tuner.so /usr/lib64/bpftune//route_table_tuner.so -rwxr-xr-x 1 root root 1626360 Jul 11 09:40 /usr/lib64/bpftune//neigh_table_tuner.so -rwxr-xr-x 1 root root 1622040 Jul 11 09:40 /usr/lib64/bpftune//route_table_tuner.so -rwxr-xr-x 1 root root 896456 Jul 11 09:40 /usr/lib64/bpftune//tcp_cong_tuner.so
the "no such file or directory" comes from an ENOENT error; in the case of the neigh_table_tuner, what's missing are the ipv6 tunables . in the case of the tcp congestion tuner, the tuner is not there due to the issues with the string section; it's just that we don't fall over now and segfault. if ipv6 is disabled that probably explains the neigh table tuner issues.
So, all must working proper? How i can check status or some stats while bpftune started as deamon?
Can you fix that errors for disabled ipv6 configurations, please?
And string "tcp_cong open bpf: No such process" - is all ok too?
Thanks!
i'm working on adding support for handling ipv6 disabled by making some tunables optional; should have a fix for this in the next few days. the tcp_cong_tuner issue is that bpf won't load due to the .rodata.str.1 section being a problem on your system. i haven't been able to reproduce that but will try and fix it once i can.
If need any data from my system - just say how to collect, i will.
Thanks.
great, thanks!
https://github.com/oracle-samples/bpftune/pull/30 should help for cases where ipv6 is disabled; it makes ipv6 tunables optional such that the tuner will not fail to load if it optional tunables are not found. still need to solve the tcp_cong_tuner issue..
now after start in syslog that:
Jul 12 08:44:29 nginx-11 bpftune[41197]: bpftune works fully Jul 12 08:44:29 nginx-11 bpftune[41197]: bpftune supports per-netns policy (via netns cookie) Jul 12 08:44:30 nginx-11 bpftune[41197]: tcp_cong open bpf: No such process Jul 12 08:44:30 nginx-11 bpftune[41197]: error initializing '/usr/lib64/bpftune/tcp_cong_tuner.so: No such process Jul 12 08:44:30 nginx-11 bpftune[41197]: could not open /proc/sys/net/ipv6/neigh/default/gc_interval (netns fd 0) for reading: No such file or directory Jul 12 08:44:30 nginx-11 bpftune[41197]: could not open /proc/sys/net/ipv6/neigh/default/gc_stale_time (netns fd 0) for reading: No such file or directory Jul 12 08:44:30 nginx-11 bpftune[41197]: could not open /proc/sys/net/ipv6/neigh/default/gc_thresh1 (netns fd 0) for reading: No such file or directory Jul 12 08:44:30 nginx-11 bpftune[41197]: could not open /proc/sys/net/ipv6/neigh/default/gc_thresh2 (netns fd 0) for reading: No such file or directory Jul 12 08:44:30 nginx-11 bpftune[41197]: could not open /proc/sys/net/ipv6/neigh/default/gc_thresh3 (netns fd 0) for reading: No such file or directory Jul 12 08:44:30 nginx-11 bpftune[41197]: could not open /proc/sys/net/ipv6/route/max_size (netns fd 0) for reading: No such file or directory
the above is all expected if ipv6 isn't enabled; the aim was to ensure the tuner kept going when it failed to find optional tunables. so if all went well, the neigh_table_tuner.so should still have loaded to tune v4 neighbour tables. previously to #30 a single not found would cause the tuner not to load
wow, seems working now. checking in work
great! the latest commit should have gotten rid of the .rodata.str section (you can check with "objdump -h src/tcp_cong_tuner.bpf.o" ; no .rodata.str1.1 or .rodata.cst16 sections should be present (at least that's what i see)
due to loss events for 10.164.3.28, specify 'bbr' congestion control algorithm
is that ok? something need to do?
Scenario 'specify bbr congestion control' occurred for tunable 'TCP congestion control' in global ns. Because loss rate has exceeded 1 percent for a connection, use bbr congestion control algorithm instead of default
and that?
that's a sign it's working; the congestion tuner looks at tcp connections that experience loss and switches congestion control algorithm to one that performs better under loss conditions - bbr. see "man bpftune-tcp-cong" for details.
that's a sign it's working; the congestion tuner looks at tcp connections that experience loss and switches congestion control algorithm to one that performs better under loss conditions - bbr. see "man bpftune-tcp-cong" for details.
Could implement iteration method for all available algorithms? for ex.
CONG_LIST="bbr, veno, reno, vegas, westwood, htcp, . . .";
foreach(algo in CONG_LIST)
do
set_as_main(algo);
if (connection_quality < pervious)
continue;
....
As I know, westwood works better on a Wi-Fi network.
yeah, i'm looking at seeing if we can incorporate reinforcement learning techniques in tuning in the future; exploring different policies rather than having a rigid approach would definitely be part of that.
Segmentation fault resolved so closing this out
Hello,
Got Segmentation fault (core dumped) when trying to run bpftune on Linux 5.19.0-1026-gcp kernel with:
[949460.456403] bpftune[82605]: segfault at 0 ip 00007f146066fde2 sp 00007fff34453140 error 4 in tcp_cong_tuner.so[7f146066f000+2000] [949460.456415] Code: 45 a8 0f b6 40 40 83 f0 01 84 c0 0f 84 97 00 00 00 e8 c3 f7 ff ff 48 89 45 c8 48 8b 45 a8 48 8b 55 c8 48 89 50 48 48 8b 45 c8 <48> 8b 10 48 8b 45 a8 48 89 50 38 e8 8e f4 ff ff 48 8b 55 c8 48 8b
in dmesg.
How i can fix that?
Thanks.