mwarning / zerotier-openwrt

A OpenWrt package for ZeroTier One - Pull requests are welcome!
697 stars 146 forks source link

segmentation fault #9

Closed hajoscher closed 8 years ago

hajoscher commented 8 years ago

Segfault on zerotier 1.1.14

CONFIG_TARGET_ar71xx=y CONFIG_TARGET_ar71xx_generic_WZRHPAG300H=y

Here is some debugger info:

[Tue Aug  2 07:50:03 2016] node/Switch.cpp:637 sending NAT-t message to     xxxxxxxx (xxx.xxx.xxx.xxx/45166) 
[New LWP 8178]

Program received signal SIGSEGV, Segmentation fault. 
[Switching to LWP 8178]
splice (i=..., x=..., position=..., this=0x76d7ffb0)
at /openwrt/staging_dir/target-mips_34kc_uClibc-0.9.33.2/usr/include/uClibc++/list:608
608 /openwrt/staging_dir/target-mips_34kc_uClibc-    0.9.33.2/usr/include/uClibc++/list: No such file or directory.
(gdb) bt 
#0  splice (i=..., x=..., position=..., this=0x76d7ffb0)
    at /openwrt/staging_dir/target-mips_34kc_uClibc- 0.9.33.2/usr/include/uClibc++/list:608
#1  ZeroTier::DeferredPackets::process (this=0x4e63b8)
at node/DeferredPackets.cpp:89 
#2  0x0042250a in ZeroTier::Node::backgroundThreadMain (this=0x4e1f78)
at node/Node.cpp:647
#3  0x0044f8fa in threadMain (warning: GDB can't find the start of the function at     0x77fc678a.

    GDB is unable to find the start of the function at 0x77fc678a
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
This problem is most likely caused by an invalid program counter or
stack pointer.
However, if you think GDB should simply search farther back
from 0x77fc678a for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
this=<optimized out>) at service/../node/Node.hpp:130
#4  ZeroTier::___zt_threadMain<ZeroTier::Node> (instance=<optimized out>)
at service/../osdep/Thread.hpp:113
#5  0x77fc678c in ?? ()
(gdb) up
#1  ZeroTier::DeferredPackets::process (this=0x4e63b8)
    at node/DeferredPackets.cpp:89
89  node/DeferredPackets.cpp: No such file or directory.
(gdb) print _q
$1 = {list_start = 0x4e3668, list_end = 0x4e3668, elements = 1, 
  a = {<No data fields>}}
(gdb) print pkt
$2 = {list_start = 0x4e7af8, list_end = 0x4e7af8, elements = 0, 
  a = {<No data fields>}}
mwarning commented 8 years ago

meh, I thought this commit would have fixed the crash: https://github.com/zerotier/ZeroTierOne/commit/830250759cd4c14ca2ae5ddf24f0a0427f258622

No idea so far. It has been working for me..

hajoscher commented 8 years ago

I tried to track it down, seems like a bug in uclibc++ for a special case in the std::list::splice function:

https://bitbucket.org/nuttx/uclibc/src/de9babae33085a109bd9e6a6f063d67a1c598149/include/uClibc++/list?at=master&fileviewer=file-view-default#list-708

https://git.busybox.net/uClibc++/tree/include/list#n608

previous is not defined if the list has only one element.

As a workaround one could patch node/DeferredPackets.cpp in zerotier to use _q instead of pkt when calling tryDecode and skip the splice. Not sure why this dummy list is needed anyway. Or use push_back and pop instead of splice. Anyway, I'll submit an issue to the author of uclibc++ as well.

muebau commented 8 years ago

Hi, I have problems with seg faults to:

[ 375.300293] [ 375.300293] do_page_fault(): sending SIGSEGV to zerotier-one for invalid write access to 00000004 [ 375.309381] epc = 0040b2b1 in zerotier-one[400000+9c000] [ 375.314965] ra = 0040b273 in zerotier-one[400000+9c000] [ 375.320425] [ 531.892032] [ 531.892032] do_page_fault(): sending SIGSEGV to zerotier-one for invalid write access to 00000004 [ 531.901110] epc = 0040b2b1 in zerotier-one[400000+9c000] [ 531.906795] ra = 0040b273 in zerotier-one[400000+9c000] [ 531.912671]

So most likely because of the same reasons.

Did you got an answer or workaround from uclibc++?

hajoscher commented 8 years ago

I did not get feedback yet, there is only an email of the author, no issue tracker. But you can try my workaround here, which I am happy with:

https://github.com/hajoscher/zerotier-openwrt/tree/splice-workaround/zerotier

@mwarning would you kindly review this patch and if you agree I could issue a pull request.

adamierymenko commented 8 years ago

DeferredPackets is dying in the next version so don't bother. A quick fix would be to edit OneService.cpp and eliminate the two Thread::start() lines (one after the other) that start background threads. That way everything will just happen in the main thread.

adamierymenko commented 8 years ago

The defer queue was kind of ugly so it died. 1.2.0 will do this differently.

adamierymenko commented 8 years ago

But I'd do a bug report to uclibc++ for that std::list::splice() bug. That will show up elsewhere.

hajoscher commented 8 years ago

Thank you. Any workaround is temporary anyway, so I am happy with the current patch.

Sure, I did send an email about the bug, since there is no bug tracker. No response however. I will submit one for openwrt directly.

mwarning commented 8 years ago

@hajoscher nice find find! @adamierymenko thanks for dropping by and suggesting a fix.

mwarning commented 8 years ago

@adamierymenko can you explain how removing both lines will fix the problem? I do not see the connection between between thread and the slice bug.

mwarning commented 8 years ago

I have upload new release binaries. Let me know if they work for you. Thanks. :-)

mwarning commented 8 years ago

I assume the problem has been fixed and close this issue.

ghost commented 6 years ago

This uclibc++ problem is also mentioned at https://bugs.openwrt.org/index.php?do=details&task_id=1859