oracle / bpftune

bpftune uses BPF to auto-tune Linux systems
Other
1.35k stars 69 forks source link

many_netns_test.sh fails test #107

Open rknobbe opened 1 day ago

rknobbe commented 1 day ago

running the test suite catches an error on many_netns_test.sh|wmem

TEST_ID=$PPID  bash many_netns_test.sh
many_netns_test.sh|wmem test to 192.168.168.1:5201 ipv4 opts  |START
net.ipv4.tcp_wmem = 4096 16384 16384
net.ipv4.tcp_wmem = 4096 16384 16384
Running baseline...
Running test...
net.ipv4.tcp_wmem = 4096 16384 4194304
wmem before 16384 ; after 40389085
netns wmem before 16384 ; after 16384
net.ipv6.conf.all.disable_ipv6 = 0
Output of last command:

Connecting to host 192.168.168.1, port 5201
[  5] local 192.168.168.2 port 57398 connected to 192.168.168.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.68 GBytes  22997 Mbits/sec    0   6.96 MBytes       
[  5]   1.00-2.00   sec  4.13 GBytes  35473 Mbits/sec    0   6.96 MBytes       
[  5]   2.00-3.00   sec  3.60 GBytes  30927 Mbits/sec    0   6.96 MBytes       
[  5]   3.00-4.00   sec  3.84 GBytes  32996 Mbits/sec    0   6.96 MBytes       
[  5]   4.00-5.00   sec  3.71 GBytes  31908 Mbits/sec    0   6.96 MBytes       o sadm sds apst 
[  5]   5.00-6.00   sec  3.79 GBytes  32583 Mbits/sec    0   6.96 MBytes       
[  5]   6.00-7.00   sec  3.73 GBytes  32061 Mbits/sec    0   6.96 MBytes       
[  5]   7.00-8.00   sec  3.83 GBytes  32923 Mbits/sec    0   6.96 MBytes       
[  5]   8.00-9.00   sec  3.72 GBytes  31948 Mbits/sec    0   6.96 MBytes       
[  5]   9.00-10.00  sec  3.81 GBytes  32676 Mbits/sec    0   6.96 MBytes       0
- - - - - - - - - - - - - - - - - - - - - - - - -0
[ ID] Interval           Transfer     Bitrate         Retr0
[  5]   0.00-10.00  sec  37.0 GBytes  31780 Mbits/sec    0             sender0
[  5]   0.00-10.00  sec  37.0 GBytes  31780 Mbits/sec                  receiver

iperf Done.
many_netns_test.sh|wmem test to 192.168.168.1:5201 ipv4 opts  |FAIL; error 1|
make[1]: *** [Makefile:85: many_netns_test] Error 1
make[1]: Leaving directory '/home/roger/bpftune/test'
make: *** [Makefile:49: test] Error 2
roger@m910q:~/bpftune$ uname -a
Linux m910q 6.8.0-49-generic #49-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov  4 02:06:24 UTC 2024 x86_
64 x86_64 x86_64 GNU/Linux
roger@m910q:~/bpftune$ lsb_release -a
No LSB modules are available.
Distributor ID:Ubuntu
Description:Ubuntu 24.04.1 LTSout
Release:24.04
Codename:noble

roger@m910q:~/bpftune$ git show
commit efe78a0cf85d1b4804ff1c14bcaa89684b7181e9 (HEAD -> main, origin/main, origin/HEAD)
Merge: 870d466 aaec117
Author: Alan Maguire <32452915+alan-maguire@users.noreply.github.com>
Date:   Thu Nov 21 12:47:00 2024 +0000

    Merge pull request #106 from oracle/fdleak

    Fdleak
alan-maguire commented 10 hours ago

thanks for the report! looks like the namespace wmem was not updated as expected; i'll investigate further. as a matter of interest

rknobbe commented 9 hours ago

Yes, happens every time I run it. Same with wmem_test.sh and rmem_test.sh

roger@m910q:~/bpftune/test$ sudo bash ./wmem_test.sh
[sudo] password for roger:
./wmem_test.sh|wmem test to 192.168.168.1:5201 ipv4 opts  |START
net.ipv4.tcp_wmem = 4096 16384 16384
net.ipv4.tcp_wmem = 4096 16384 16384
Running baseline...
Running test...
net.ipv4.tcp_wmem = 4096 16384 4194304
wmem before 16384 ; after 50486356
netns wmem before 16384 ; after 16384
net.ipv6.conf.all.disable_ipv6 = 0
Output of last command:

Connecting to host 192.168.168.1, port 5201
[  5] local 192.168.168.2 port 46122 connected to 192.168.168.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  3.67 GBytes  31492 Mbits/sec    0   6.18 MBytes
[  5]   1.00-2.00   sec  4.31 GBytes  37050 Mbits/sec    0   6.81 MBytes
[  5]   2.00-3.00   sec  4.17 GBytes  35844 Mbits/sec    0   6.81 MBytes
[  5]   3.00-4.00   sec  4.38 GBytes  37586 Mbits/sec    0   6.81 MBytes
[  5]   4.00-5.00   sec  3.97 GBytes  34096 Mbits/sec    0   6.81 MBytes
[  5]   5.00-6.00   sec  3.68 GBytes  31606 Mbits/sec    0   6.81 MBytes
[  5]   6.00-7.00   sec  3.92 GBytes  33666 Mbits/sec    0   6.81 MBytes
[  5]   7.00-8.00   sec  3.77 GBytes  32397 Mbits/sec    0   6.81 MBytes
[  5]   8.00-9.00   sec  3.75 GBytes  32167 Mbits/sec    0   6.81 MBytes
[  5]   9.00-10.00  sec  3.82 GBytes  32795 Mbits/sec    0   6.81 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  39.5 GBytes  33920 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  39.5 GBytes  33920 Mbits/sec                  receiver

iperf Done.
./wmem_test.sh|wmem test to 192.168.168.1:5201 ipv4 opts  |FAIL; error 1|
roger@m910q:~/bpftune/test$
roger@m910q:~/bpftune/test$ sudo bash ./rmem_test.sh
./rmem_test.sh|rmem test to 192.168.168.1:5201 ipv4 opts  |START
net.ipv4.tcp_rmem = 4096 131072 131072
net.ipv4.tcp_rmem = 4096 131072 131072
Running baseline...
Running test...
net.ipv4.tcp_rmem = 4096 131072 7864320
rmem before 131072 ; after 256000
netns rmem before 131072 ; after 131072
net.ipv6.conf.all.disable_ipv6 = 0
Output of last command:

Connecting to host 192.168.168.1, port 5201
[  5] local 192.168.168.2 port 39162 connected to 192.168.168.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.53 GBytes  21749 Mbits/sec    0    175 KBytes
[  5]   1.00-2.00   sec  2.33 GBytes  19996 Mbits/sec    0    175 KBytes
[  5]   2.00-3.00   sec  2.48 GBytes  21284 Mbits/sec    0    175 KBytes
[  5]   3.00-4.00   sec  2.53 GBytes  21716 Mbits/sec    0    175 KBytes
[  5]   4.00-5.00   sec  2.57 GBytes  22046 Mbits/sec    0    175 KBytes
[  5]   5.00-6.00   sec  2.60 GBytes  22356 Mbits/sec    0    175 KBytes
[  5]   6.00-7.00   sec  2.60 GBytes  22294 Mbits/sec    0    175 KBytes
[  5]   7.00-8.00   sec  2.58 GBytes  22149 Mbits/sec    0    175 KBytes
[  5]   8.00-9.00   sec  2.61 GBytes  22432 Mbits/sec    0    175 KBytes
[  5]   9.00-10.00  sec  2.50 GBytes  21451 Mbits/sec    0    175 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  25.3 GBytes  21761 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  25.3 GBytes  21760 Mbits/sec                  receiver

iperf Done.
./rmem_test.sh|rmem test to 192.168.168.1:5201 ipv4 opts  |FAIL; error 1|
roger@m910q:~/bpftune/test$
alan-maguire commented 9 hours ago

what does "bpftune -S" report for the failing system?

rknobbe commented 9 hours ago

bpftune: bpftune works fully bpftune: bpftune supports per-netns policy (via netns cookie)

rknobbe commented 4 hours ago

if it's helpful I'm also seeing these in dmesg while I'm running the tests.

[Fri Nov 22 11:37:52 2024] bpftune[850138]: segfault at 76996da39002 ip 000076996be06c32 sp 00007ffc4864ec30 error 4 in tcp_buffer_tuner.so[76996be02000+6000] likely on CPU 3 (core 3, socke
t 0)
[Fri Nov 22 11:37:52 2024] Code: 74 b7 ff ff e9 4c 04 00 00 48 8b 85 28 ff ff ff 8b 40 40 83 f8 03 75 20 48 8b 85 28 ff ff ff 48 8b 40 48 48 8b 80 a8 00 00 00 <0f> b6 40 02 0f b6 c0 85 c0 0
f 95 c0 eb 4d 48 8b 85 28 ff ff ff 8b
[Fri Nov 22 11:38:27 2024] bpftune[850675]: segfault at 7724d7416002 ip 00007724d5806c61 sp 00007ffcd952ea00 error 4 in tcp_buffer_tuner.so[7724d5802000+6000] likely on CPU 2 (core 2, socke
t 0)
[Fri Nov 22 11:38:27 2024] Code: b6 c0 85 c0 0f 95 c0 eb 4d 48 8b 85 28 ff ff ff 8b 40 40 83 f8 02 75 20 48 8b 85 28 ff ff ff 48 8b 40 48 48 8b 80 a0 00 00 00 <0f> b6 40 02 0f b6 c0 85 c0 0
f 95 c0 eb 1e 48 8b 85 28 ff ff ff 48
[Fri Nov 22 11:39:03 2024] bpftune[851139]: segfault at 7b3f03b20002 ip 00007b3f02006c32 sp 00007ffce1411fe0 error 4 in tcp_buffer_tuner.so[7b3f02002000+6000] likely on CPU 1 (core 1, socke
t 0)
[Fri Nov 22 11:39:03 2024] Code: 74 b7 ff ff e9 4c 04 00 00 48 8b 85 28 ff ff ff 8b 40 40 83 f8 03 75 20 48 8b 85 28 ff ff ff 48 8b 40 48 48 8b 80 a8 00 00 00 <0f> b6 40 02 0f b6 c0 85 c0 0
f 95 c0 eb 4d 48 8b 85 28 ff ff ff 8b
[Fri Nov 22 11:39:37 2024] bpftune[851631]: segfault at 76b376846002 ip 000076b374c06c61 sp 00007fffa5023290 error 4 in tcp_buffer_tuner.so[76b374c02000+6000] likely on CPU 2 (core 2, socke
t 0)
[Fri Nov 22 11:39:37 2024] Code: b6 c0 85 c0 0f 95 c0 eb 4d 48 8b 85 28 ff ff ff 8b 40 40 83 f8 02 75 20 48 8b 85 28 ff ff ff 48 8b 40 48 48 8b 80 a0 00 00 00 <0f> b6 40 02 0f b6 c0 85 c0 0
f 95 c0 eb 1e 48 8b 85 28 ff ff ff 48