hyperboria / bugs

Peer-to-peer IPv6 networking, secure and near-zero-conf.
154 stars 17 forks source link

RouteModule_getPeers("0000.0000.0000.0001") crashes cjdns. #53

Closed Kubuxu closed 8 years ago

Kubuxu commented 9 years ago

If you do ./tools/traceroute [self_cjdns_ip] the cjdns server will crash with following error in logs:

Assertion failure [NodeStore.c:631] [(Node_getBestParent(node) && node != store->pub.selfNode)]

System: Debian Kernel: 2.6/OpenVM Verson: Newest/ 3311d304ddebda9d8eaa1d389905f12cd8990a62

ansuz commented 9 years ago

the line in question reminds me of another assertion failure I "patched". Apparently at some point pinging yourself caused a failure in some other system, and it was rendered unnecessary by some other logic that handled self pings.

It's possible you may just be able to remove this one as well, but I can't say for sure. There's no documentation indicating how it should behave with self pings. A forward-thinking solution would involve documenting this and other assertion failures, and the reason for their existence.

I'll check back in on this later tonight and see what information I can find, but any digging anyone else wants to do will be greatly appreciated.

ansuz commented 9 years ago

PS: I was trying to traceroute when I came across that other assertion, and I guess I never really did too much follow-up. Thanks for raising the issue once again. Traceroute should help us figure out a lot of other issues.

Kubuxu commented 9 years ago

Without the asset in line 631 I get: http://hastebin.com/makeloqoxo

Looks like it is not that easy this time.

ansuz commented 9 years ago

interesting! I guess it's time to figure out why nodes can't ping themselves.

Kubuxu commented 9 years ago

One thing. This backtrace might not be connected with anything. I am running those tests on different machine in ie. --nobg mode. This could be error reporting trying to open log file or something.

EDIT: Using tools/cjdnslog stops this backtrace from appearing but still shows failed syscall 2. EDIT2: tools/ping and tools/traceroute cause crash both.

Kubuxu commented 9 years ago

This backtrace is only from client exiting:

#0  Assert_failure (format=<optimized out>) at util/Assert.c:40
#1  0x00005555555b2a74 in onCoreExit (exit_status=<optimized out>, 
    term_signal=<optimized out>) at client/cjdroute2.c:457
#2  0x00005555555c9132 in uv__chld (handle=<optimized out>, 
    signum=<optimized out>) at ../src/unix/process.c:112
#3  0x00005555555c9b99 in uv__signal_event (loop=0x5555557f25b0, 
    w=<optimized out>, events=<optimized out>) at ../src/unix/signal.c:386
#4  0x00005555555cfbd4 in uv__io_poll (loop=loop@entry=0x5555557f25b0, 
    timeout=-1) at ../src/unix/linux-core.c:271
#5  0x00005555555c4fd7 in uv_run (loop=0x5555557f25b0, 
    mode=mode@entry=UV_RUN_DEFAULT) at ../src/unix/core.c:284
#6  0x000055555555fc75 in EventBase_beginLoop (eventBase=0x5555557f2568)
    at util/events/libuv/EventBase.c:83
#7  0x0000555555559ed3 in main (argc=1434415640, argv=0x7fffffffe548)
    at client/cjdroute2.c:662
ghost commented 9 years ago

Yep that's probably from the forbidden syscall

Kubuxu commented 9 years ago

Narrowing it down it is "RouteModule_getPeers(0000.0000.0000.0001)".

ghost commented 9 years ago

Good one.

$ /opt/cjdns/tools/cexec 'RouterModule_getPeers("0000.0000.0000.0001")'

30 seconds later:

1435582847 DEBUG Pinger.c:73 Ping timeout for [2965572845] in [30400] ms
1435582847 DEBUG NodeStore.c:2326 Ping timeout for fc06:c135:28a5:8c0b:dd4e:bcb6:d4d6:c96d@0000.0000.0000.0001. changing reach from 4294967295 to 3758096383

Assertion failure [NodeStore.c:631] [(Node_getBestParent(node) && node != store->pub.selfNode)]
Attempted banned syscall number [2] see doc/Seccomp.md for more information

Core exited with status [0], signal [31]
Backtrace (10 frames):
  cjdroute(+0x695a) [0x7f86ce96395a]
  cjdroute(+0x5edc4) [0x7f86ce9bbdc4]
  cjdroute(+0x75df2) [0x7f86ce9d2df2]
  cjdroute(+0x76891) [0x7f86ce9d3891]
  cjdroute(+0x7cd94) [0x7f86ce9d9d94]
  cjdroute(+0x71b47) [0x7f86ce9ceb47]
  cjdroute(+0xbbc5) [0x7f86ce968bc5]
  cjdroute(+0x5dcd) [0x7f86ce962dcd]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f86cdf67a40]
  cjdroute(+0x6399) [0x7f86ce963399]
Aborted (core dumped)

I assume this is the change that introduced the open() syscall: https://github.com/cjdelisle/cjdns/pull/778 (not that it'd make any difference regarding the self-ping error)

Kubuxu commented 9 years ago

This backtrace which we got is unimportant as its cause is syscall sandboxing. This is backtrace of failed assert (which is supposed to get printed but is not):

#0  Assert_failure (format=format@entry=0x7f5f612000e0 "Assertion failure [%s:%d] [%s]\n") at util/Assert.c:29
#1  0x00007f5f611a6fa1 in handleBadNews (node=0x7f5f61f9edd8, newReach=<optimized out>, store=<optimized out>) at dht/dhtcore/NodeStore.c:631
#2  0x00007f5f611a719d in handleNews (node=0x7f5f61f9edd8, newReach=3758096383, store=0x7f5f61fc6598) at dht/dhtcore/NodeStore.c:653
#3  0x00007f5f611ae084 in NodeStore_pathTimeout (nodeStore=0x7f5f61fc6598, path=140047628124891) at dht/dhtcore/NodeStore.c:2328
#4  0x00007f5f611b0169 in onTimeout (pctx=<optimized out>, milliseconds=<optimized out>) at dht/dhtcore/RouterModule.c:429
#5  onResponseOrTimeout (data=0x7f5f612000e0, milliseconds=38100, vping=0x7f5f620018a8) at dht/dhtcore/RouterModule.c:472
#6  0x00007f5f611af7ef in callback (ping=<optimized out>, data=<optimized out>) at util/Pinger.c:55
#7  timeoutCallback (vping=0x7f5f61ffef68) at util/Pinger.c:74
#8  0x00007f5f611fc771 in uv__run_timers (loop=loop@entry=0x7f5f61f9c2b0) at ../src/unix/timer.c:146
#9  0x00007f5f611f2f72 in uv_run (loop=0x7f5f61f9c2b0, mode=mode@entry=UV_RUN_DEFAULT) at ../src/unix/core.c:275
#10 0x00007f5f6118dc75 in EventBase_beginLoop (eventBase=eventBase@entry=0x7f5f61f9c268) at util/events/libuv/EventBase.c:83
#11 0x00007f5f611da778 in Core_main (argc=<optimized out>, argv=<optimized out>) at admin/angel/Core.c:326
#12 0x00007f5f61187593 in main (argc=3, argv=0x7ffca9303f68) at client/cjdroute2.c:467
Arceliar commented 9 years ago

SessionManager.c drops packets from ourself.

diff --git a/net/SessionManager.c b/net/SessionManager.c
index 123ba5d..a5078b8 100644
--- a/net/SessionManager.c
+++ b/net/SessionManager.c
@@ -266,7 +266,7 @@ static Iface_DEFUN incomingFromSwitchIf(struct Message* msg, struct Iface* iface
             return NULL;
         }

-        if (!Bits_memcmp(herKey, sm->cryptoAuth->publicKey, 32)) {
+        if (false && !Bits_memcmp(herKey, sm->cryptoAuth->publicKey, 32)) {
             Log_debug(sm->log, "DROP Handshake from 'ourselves'");
             return NULL;
         }

getPeers seems to work for me now, although the traceroute script gets stuck in infinite loops when i try to traceroute other people.

Kubuxu commented 9 years ago

So I traced the DHTModules and request with self interface path reach this. Then I am lost again.

We might have to go with modified @Arceliar solution.

ghost commented 8 years ago

This seems fixed