magnific0 / wondershaper

Command-line utility for limiting an adapter's bandwidth
GNU General Public License v2.0
1.73k stars 264 forks source link

Wondershaper (tc filter add dev) segfaults NetworkManager on Arch Linux #16

Closed mistyharsh closed 6 years ago

mistyharsh commented 6 years ago

Until yesterday, wondershaper was running extremely smooth. Howevever, after today's upgrade Linux 4.9.68-1-MANJARO #1 SMP PREEMPT Sun Dec 10 20:17:45 UTC 2017 x86_64 GNU/Linux, wondershaper no longer seems to be working.

Command sudo wondershaper -a wlp2s0 -u 1000 causes my wireless network to automatically disconnect and I have no way but to restart my system. Restarting network daemon or loggin out doesn't help.

Anything I can do to help?

magnific0 commented 6 years ago

Hi @mistyharsh I can't replicate your issue here Linux 4.14.3-1-ARCH #1 SMP PREEMPT Thu Nov 30 18:33:13 UTC 2017 x86_64 GNU/Linux. Also on wireless.

Can you check your kernel messages right after this happens for you?

dmesg | tail -20

And also report your wireless card make and model?

lspci | grep -i wireless
mistyharsh commented 6 years ago

Thank you @magnific0, Sorry for late reply. Here are the details you requested:

My wireless card:

02:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)

Log of kernel messages

This is the recent log kernel message:

[Dec26 22:41] SGI XFS with ACLs, security attributes, realtime, no debug enabled
[  +0.007131] JFS: nTxBlock = 8192, nTxLock = 65536
[  +0.009441] ntfs: driver 2.1.32 [Flags: R/W MODULE].
[  +0.071284] raid6: sse2x1   gen()  9725 MB/s
[  +0.056618] raid6: sse2x1   xor()  7154 MB/s
[  +0.056661] raid6: sse2x2   gen() 12277 MB/s
[  +0.056664] raid6: sse2x2   xor()  8303 MB/s
[  +0.056666] raid6: sse2x4   gen() 14169 MB/s
[  +0.056668] raid6: sse2x4   xor()  8871 MB/s
[  +0.056665] raid6: avx2x1   gen() 20007 MB/s
[  +0.056669] raid6: avx2x2   gen() 24056 MB/s
[  +0.056686] raid6: avx2x4   gen() 27495 MB/s
[  +0.000002] raid6: using algorithm avx2x4 gen() 27495 MB/s
[  +0.000001] raid6: using avx2x2 recovery algorithm
[  +0.001189] xor: automatically using best checksumming function   avx       
[  +0.013786] Btrfs loaded, crc32c=crc32c-intel
[Dec26 22:44] u32 classifier
[  +0.000001]     input device check on
[  +0.000001]     Actions configured
[  +0.000185] NetworkManager[449]: segfault at 0 ip 00007ff5ced15461 sp 00007ffe62104578 error 4 in libc-2.26.so[7ff5cebbe000+1ae000]
[  +1.476471] wlp2s0: deauthenticating from 10:c3:7b:e0:ff:2c by local choice (Reason: 3=DEAUTH_LEAVING)
[  +1.079620] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[  +0.035909] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
magnific0 commented 6 years ago

Thanks for the outputs. It seems that NetworkManager segfaults. I don't know exactly what causes this, but this seems to be an issue with your system. I use NM too.

You could try to debug NM and see what happens. The process is described here.

You could try to run each of the lines in wondershaper one by one and see which of the lines cause this.

Does (re)starting NM work instead of rebooting the system?

systemctl restart NetworkManager.service
mistyharsh commented 6 years ago

Thanks, @magnific0. I will explore and get back if I can gather more information.

sigmaSd commented 6 years ago

hello, I have the same problem on Arch, I run your code line by line , this error happened a lot till it crached NM

tc class add dev wlp3s0 parent 1: classid 1:1 htb rate 500kbit prio 5
RTNETLINK answers: No such file or directory

this is the line that it craches is on

tc filter add dev wlp3s0 parent ffff: protocol ip u32 match u32 0 0 action mirred egress redirect dev ifb0

I joined a file with all the error that happened wonder_diag.txt

sigmaSd commented 6 years ago

I tried it on a fresh arch install on vbox with gnome and Networkmanager. same crach happened

sigmaSd commented 6 years ago

i debuged with gdb it crached with Thread 1 "NetworkManager" received signal SIGSEGV, Segmentation fault. 0x00007ffff5277496 in __strlen_sse2 () from /usr/lib/libc.so.6

heres the stacktrace of the failure :

#0  0x00007ffff5277496 in __strlen_sse2 () at /usr/lib/libc.so.6
#1  0x000055555563480a in  ()
#2  0x0000555555642126 in  ()
#3  0x00005555555bc577 in  ()
#4  0x00007ffff675bcf4 in g_hash_table_lookup () at /usr/lib/libglib-2.0.so.0
#5  0x00005555555bcf4a in  ()
#6  0x00005555555bd0d4 in  ()
#7  0x00005555555bd497 in  ()
#8  0x00005555556423f2 in  ()
#9  0x00005555556447c9 in  ()
#10 0x0000555555622c03 in  ()
#11 0x000055555562419d in  ()
#12 0x0000555555625131 in  ()
#13 0x0000555555625381 in  ()
#14 0x00007ffff677c0be in g_main_context_dispatch ()
    at /usr/lib/libglib-2.0.so.0
#15 0x00007ffff677df69 in  () at /usr/lib/libglib-2.0.so.0
#16 0x00007ffff677ef42 in g_main_loop_run () at /usr/lib/libglib-2.0.so.0
#17 0x000055555558036e in  ()
#18 0x00007ffff5201f4a in __libc_start_main () at /usr/lib/libc.so.6
#19 0x00005555555809da in  ()
magnific0 commented 6 years ago

@sigmaSd thanks for checking in. At least we know it occurs for more people now. Also many thanks for doing the line by line test. This is very helpful. The RTNETLINK errors that occur before seems to suggest that some of the tc functionality is not available (missing kernel modules?). Can you report the output of:

sudo modprobe sch_htb

It is likely related to the virtualization. What host OS are you using? @mistyharsh are you running Arch on a VM/VPS too?

The segmentation fault happens once incoming traffic is redirected to the ifb0 device. I know too little about the linux kernel and virtual kernel drivers and permissions to help you further. If this is a bug, it should perhaps be taken up with them.

mistyharsh commented 6 years ago

@magnific0, I am currently using Arch Linux (Manjaro Deepin 64bit) on my Laptop. It is not VM or VPS. Yes, it seems to be the bug with Linux Kernel.

sigmaSd commented 6 years ago

@magnific0 , no error sudo modprobe sch_htb , host and guest are on arch , right now i cant help much but i'll take the chance to say thanks for the nice work its really an important program for linux community .

magnific0 commented 6 years ago

@sigmaSd thanks, much appreciated!

@mistyharsh so it's not (just) VPS. The response of @sigmaSd suggests that the kernel modules are present. As of my latest update I am luckily (?) experiencing this problem too.

The outgoing limiting works if no filters are set. So for instance by commenting out the following lines: https://github.com/magnific0/wondershaper/blob/master/wondershaper#L163-L208

Incoming shaping is not possible at the moment.

Now that I can replicate the issue I will look more into what's causing this and maybe implementing a (temporary) work around.

magnific0 commented 6 years ago

Update: Wondershaper works without NetworkManager.

So the bug is likely there. I was able to replicate the bug for both wired and wireless connections. NetworkManager segfaults for any

tc filter add dev ...

command that I tried if the interface is managed by NM.

I have added a bug report to the Arch Linux bugtracker. Perhaps we can also take it up with NM, if other distro's are having the same problem.

https://bugs.archlinux.org/task/57049

If your experiencing this issue too, please report in on the bug at the link below to increase it's visibility.

magnific0 commented 6 years ago

Great news, the bug has been fixed in extra/networkmanager 1.10.3dev+38+g78ef57197-1. Let me know if this also resolves the issue for you @mistyharsh and @sigmaSd

sigmaSd commented 6 years ago

I can confirm Its fixed with networkmanager 1.10.3dev+38+g78ef57197 !