hop limit when using mdp trace

gh0st42 commented 8 years ago

I did some network tests with serval and discovered some odd behaviour. Details about the setup can be found in the following blog article: http://otg-living.blogspot.de/2016/01/hop-hop-hop-hop-stop.html

Basically I have 18 nodes running serval in one long chain. n1 ... n18 I can use servald mdp ping and reach every single one of the 18 nodes from n1. The problem starts when I try to trace the path.

root@n1:/tmp/pycore.46656/n1.conf# /home/meshadmin/serval-dna/servald mdp trace F40716A16538D25EA01134139056F02D23F9246583012048B4DBA4BBB46A594E
Tracing the network path from 8410960D885656669C1B4C4AA56E4339B171E9285A254A1863A80FF7F483A141 to F40716A16538D25EA01134139056F02D23F9246583012048B4DBA4BBB46A594E
INFO: Local date/time: 2016-01-25 11:58:43 +0100
INFO: Serval DNA version: START-3478-g8e223b5
ERROR:network_cli.c:317:app_trace()  overlay_mdp_send returned -1, Timeout waiting for reply to MDP packet (packet was successfully sent).

When I try to reach n17 it all just works (16 hops):

root@n1:/tmp/pycore.46656/n1.conf# /home/meshadmin/serval-dna/servald mdp trace 614222015D22DBCD65FD79B8C311E12DDE6CDF086E71B0D9CD9746AFCBE02E71
Tracing the network path from 8410960D885656669C1B4C4AA56E4339B171E9285A254A1863A80FF7F483A141 to 614222015D22DBCD65FD79B8C311E12DDE6CDF086E71B0D9CD9746AFCBE02E71
0:8410960D885656669C1B4C4AA56E4339B171E9285A254A1863A80FF7F483A141
1:D6A5E4F4B6EAFE3A22B82742280322695601F2C5C9DF8A8AD6C5067F04E1E139
2:D158F542C7001C55350C67A6B975B08C2B8D2A63A60EC32FEC1712D2206F5C32
3:75A61C0DF5A46BEEEEB28F18C956ACA8C696DED4CBD4C4FE31A54AE2CB26D077
4:424B43C7E6CC80BEEE761F100217824CCEF201ECEA55D14B7F4B205FFA4FF630
5:2E0FCBB04D6C5DDE8D145206A786683164A4443E0AD214E79F69D79EFA1B1D0D
6:1B669680A1555490BC271881362C443E43131ADCB94BA285F6BA1B3B71DC9853
7:9CBAFAB66A0808573ED0FCA42376BC6B527AB39DEB89FFB344EE9381BD105936
8:B9B737D8CFFD5FBFDC5A20505159EC2798D60C4CC4F39B8B0AB6D4240B931C20
9:2AD0422C040079FF320C934D2E98DC9D154896CA1E492FBB3DF720FCFE7FD009
10:BA5288F511246B8C5D77B12B78C6D46DA92D1B28008F57CDAA9F278B9EF54C2B
11:C42A672FB4D19AA59DE47200CC581220CEC36CBE935F8FEF46027CA2604CA072
12:9A3B340EAF09679FF5FCAF720418DA7E2D959018A2C55227EB95C9928B53A52C
13:8EF8016470B665CED119B3CB2F76406952567E2DC16C74AC352B831F1614DB3C
14:57E8A11D20C8970733A5A0776FEA370C0440136D7D7873C1255A26A13924D155
15:79314E660CE67A6DD19E5CAB209CE619EA973A13387D8FE32A35685C1971796E
16:614222015D22DBCD65FD79B8C311E12DDE6CDF086E71B0D9CD9746AFCBE02E71
17:79314E660CE67A6DD19E5CAB209CE619EA973A13387D8FE32A35685C1971796E
18:57E8A11D20C8970733A5A0776FEA370C0440136D7D7873C1255A26A13924D155
19:8EF8016470B665CED119B3CB2F76406952567E2DC16C74AC352B831F1614DB3C
20:9A3B340EAF09679FF5FCAF720418DA7E2D959018A2C55227EB95C9928B53A52C
21:C42A672FB4D19AA59DE47200CC581220CEC36CBE935F8FEF46027CA2604CA072
22:BA5288F511246B8C5D77B12B78C6D46DA92D1B28008F57CDAA9F278B9EF54C2B
23:2AD0422C040079FF320C934D2E98DC9D154896CA1E492FBB3DF720FCFE7FD009
24:B9B737D8CFFD5FBFDC5A20505159EC2798D60C4CC4F39B8B0AB6D4240B931C20
25:9CBAFAB66A0808573ED0FCA42376BC6B527AB39DEB89FFB344EE9381BD105936
26:1B669680A1555490BC271881362C443E43131ADCB94BA285F6BA1B3B71DC9853
27:2E0FCBB04D6C5DDE8D145206A786683164A4443E0AD214E79F69D79EFA1B1D0D
28:424B43C7E6CC80BEEE761F100217824CCEF201ECEA55D14B7F4B205FFA4FF630
29:75A61C0DF5A46BEEEEB28F18C956ACA8C696DED4CBD4C4FE31A54AE2CB26D077
30:D158F542C7001C55350C67A6B975B08C2B8D2A63A60EC32FEC1712D2206F5C32
31:D6A5E4F4B6EAFE3A22B82742280322695601F2C5C9DF8A8AD6C5067F04E1E139
32:8410960D885656669C1B4C4AA56E4339B171E9285A254A1863A80FF7F483A141

lakeman commented 8 years ago

mdp trace is a weird beast, designed to give some network diagnostic information even when the routing protocol is under active development and is completely broken. It doesn't work like you might expect ICMP to behave. It was written a long time ago and hasn't been touched since.

The packet grows as it travels. Each node searches the packet to see if it is already listed there. If it is, an attempt is made to forward the packet back to the hop listed prior to itself. Otherwise the packet is forwarded onwards to the next hop in the routing table. Unless this node doesn't know which way to send it. This way if a loop is discovered in the network, or part of the network graph only works one way, the packet might still make it back to the source.

We could massively increase the number of hops by making a couple of simple changes around here; https://github.com/servalproject/serval-dna/blob/development/overlay_mdp_services.c#L344

When we run out of buffer space; ignore, rollback & make sure the packet is sent back to the previous hop instead of onwards. That should double the maximum hop count, but you wont see the full backwards path.

Allow for sending SID abbreviations. Perhaps only when the packet is on the return path. Otherwise the protocol would be less useful for diagnosis.

Skip unknown SID's. We have to be careful here though. Being unable to decode our own SID could lead to a packet storm. We might need the capability to distinguish between; "no possible match", "ambiguous match", and "ambiguous but could be me".

Changing this is a low priority. But patches & test cases are welcome.

gh0st42 commented 8 years ago

Okay, thanks for the quick response. That explains the behaviour, still not quite sure what the best fix would be but I agree on a low priority even though it makes debugging complex networks a bit more complicated. Since we are evaluating quite a few different network setups and discovered a few more bugs we might as well formalize these test and provide test cases/network setups for these once we're done with our evaluation.

servalproject / serval-dna

hop limit when using mdp trace #95