Tinc self-DDOS - Githubissues

jakubgs commented 6 years ago

Hey,

I've been trying to figure out an issue with Tinc running on my servers that I've discovered yesterday. I have 63 hosts inter-connected via Tinc and I'm getting 100% CPU usage and growing memory usage from Tinc after I start it on most of the hosts:

tincd[5418]: Metadata socket read error for host1 (1.2.3.4 port 655): Connection reset by peer
tincd[5418]: Metadata socket read error for host2 (1.2.3.5 port 655): Connection reset by peer
tincd[5418]: Metadata socket read error for host3 (1.2.3.6 port 655): Connection reset by peer
tincd[5418]: Metadata socket read error for host4 (1.2.3.7 port 655): Connection reset by peer
tincd[5418]: Metadata socket read error for <unknown> (1.2.3.8 port 38498): Connection reset by peer
tincd[5418]: Metadata socket read error for host5 (1.2.3.9 port 655): Connection reset by peer

And when I stop most of the Tinc services on servers the CPU load does not go down. I need to stop the processes myself. But when I start them back up again I get insane high CPU usage again. I have no idea what is causing this, and this makes Tinc essentially unusable.

jakubgs commented 6 years ago

The strace doesn't look strange:

sendto(9, "\363\223\263\202\322M\336\361\263\27\255\304\355L\365\257\34N\236\263\316\355\334\200\322\24B\331\"\240\245\352"..., 400, 0, NULL, 0) = 400
select(69, [4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59 68], [10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], NULL, {tv_sec=0, tv_usec=434335}) = 56 (in [9 10 13 14 15 19 24 25 27 32 36 37 38 39 40 41 42 53 54 55 56 57 58 59], out [10 12 13 14 15 17 19 20 21 23 24 25 26 27 30 32 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], left {tv_sec=0, tv_usec=434317})
recvfrom(9, "_a\r\21\22\206H*8\256\315\26\177\2\31NAV7H(}>\10\277v\257P\276\367\353`"..., 2169, 0, NULL, NULL) = 2169
select(69, [4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59 68], [9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], NULL, {tv_sec=0, tv_usec=428349}) = 57 (in [9 10 13 14 15 19 24 25 27 32 36 37 38 39 40 41 42 53 54 55 56 57 58 59], out [9 10 12 13 14 15 17 19 20 21 23 24 25 26 27 30 32 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], left {tv_sec=0, tv_usec=428328})
sendto(9, "@\23\267\3077\204\264\t\351;\300\305\3533\"\346\02110\357\254}\355\322\36\\\10\2346\3649p"..., 488, 0, NULL, 0) = 488
select(69, [4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59 68], [10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], NULL, {tv_sec=0, tv_usec=427833}) = 56 (in [9 10 13 14 15 19 24 25 27 32 36 37 38 39 40 41 42 53 54 55 56 57 58 59], out [10 12 13 14 15 17 19 20 21 23 24 25 26 27 30 32 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], left {tv_sec=0, tv_usec=427823})
recvfrom(9, "a\321\264\271 \344\366\234v\204\5\335<\316\213\204\5:\"qR\352\34\r\6\357O\257w[\35V"..., 2111, 0, NULL, NULL) = 2111
mremap(0x7f4df1ae9000, 12382208, 12386304, MREMAP_MAYMOVE) = 0x7f4df1ae9000
mremap(0x7f4dfafe7000, 12591104, 12595200, MREMAP_MAYMOVE) = 0x7f4dfafe7000
mremap(0x7f4dfa23c000, 12566528, 12570624, MREMAP_MAYMOVE) = 0x7f4dfa23c000
select(69, [4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59 68], [9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], NULL, {tv_sec=0, tv_usec=419109}) = 57 (in [9 10 13 14 15 19 24 25 27 32 36 37 38 39 40 41 42 53 54 55 56 57 58 59], out [9 10 12 13 14 15 17 19 20 21 23 24 25 26 27 30 32 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], left {tv_sec=0, tv_usec=419081})
sendto(9, "\240\312\251*\343\34r\210\247\251\200J\21G\n\223\207b\317\211}\22\313[\265.\210\v\226\2\3333"..., 265, 0, NULL, 0) = 265
select(69, [4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59 68], [10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], NULL, {tv_sec=0, tv_usec=418443}) = 56 (in [9 10 13 14 15 19 24 25 27 32 36 37 38 39 40 41 42 53 54 55 56 57 58 59], out [10 12 13 14 15 17 19 20 21 23 24 25 26 27 30 32 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], left {tv_sec=0, tv_usec=418430})
recvfrom(9, "\252\301X \356\36d\264{\0325\230B\233H\2\255X\215\34.\r\316F\211\262\3\247\26^[\r"..., 2118, 0, NULL, NULL) = 2118
select(69, [4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59 68], [9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], NULL, {tv_sec=0, tv_usec=411550}) = 57 (in [9 10 13 14 15 19 24 25 27 32 36 37 38 39 40 41 42 53 54 55 56 57 58 59], out [9 10 12 13 14 15 17 19 20 21 23 24 25 26 27 30 32 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], left {tv_sec=0, tv_usec=411527})
sendto(9, "\245\366\0300\350\313\32p\227\211\235\246\211$\214\244\242T\366ds\\\357)\2003\270b\17\352\232K"..., 133, 0, NULL, 0) = 133
select(69, [4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59 68], [10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], NULL, {tv_sec=0, tv_usec=411013}) = 56 (in [9 10 13 14 15 19 24 25 27 32 36 37 38 39 40 41 42 53 54 55 56 57 58 59], out [10 12 13 14 15 17 19 20 21 23 24 25 26 27 30 32 34 35 36 37 38 39 40 41 42 53 54 55 56 57 58 59], left {tv_sec=0, tv_usec=411003})
recvfrom(9, "\374C\372\246\371\370}q'\10\256\23\305\364\26\311\301\352s\254\242\32\260\245\212&\346)\245x\364\271"..., 2157, 0, NULL, NULL) = 2157
mremap(0x7f4df1ae9000, 12386304, 12390400, MREMAP_MAYMOVE) = 0x7f4df1ae9000

But I can't tell.

jakubgs commented 6 years ago

With --debug=5 I'm getting:

tincd[7131]: Got ADD_EDGE from host1 (1.2.3.4 port 54734): 12 6d433d1f host6 host2 2.3.4.5 655 f 268
tincd[7131]: Got ADD_EDGE from host1 (1.2.3.4 port 54734) which does not match existing entry
tincd[7131]: Forwarding ADD_EDGE from host1 (1.2.3.4 port 54734): 12 6d433d1f host6 host2 2.3.4.5 655 f 268
tincd[7131]: Sending 99 bytes of metadata to host5 (35.226.21.19 port 39934)
tincd[7131]: Got ADD_EDGE from host1 (1.2.3.4 port 54734): 12 6526e2de host4 host3 3.4.5.6 655 f 181
tincd[7131]: Got ADD_EDGE from host1 (1.2.3.4 port 54734) which does not match existing entry

Even tough the hosts are clearly in the configuration.

jakubgs commented 6 years ago

Okay, I think I've identified something that might be causing this. I can see ridiculous amount of following messages:

 Sending 105 bytes of metadata to

All over the place after the 100% CPU usage starts. And by ridiculous amount I mean 10-30 thousand per second.

 ~$ grep '14:00:01' /var/log/syslog | grep 'bytes of metadata' | wc -l
2462
 ~$ grep '14:00:02' /var/log/syslog | grep 'bytes of metadata' | wc -l
14856
 ~$ grep '14:00:03' /var/log/syslog | grep 'bytes of metadata' | wc -l
14001
 ~$ grep '14:00:04' /var/log/syslog | grep 'bytes of metadata' | wc -l
17245
 ~$ grep '14:00:06' /var/log/syslog | grep 'bytes of metadata' | wc -l
22863
 ~$ grep '14:00:10' /var/log/syslog | grep 'bytes of metadata' | wc -l
30810

And this happened RIGHT after I saw:

Got KEY_CHANGED from

Twice for every single host in the network. All at the same time.

jakubgs commented 6 years ago

I can see there is a KeyExpire option in the config that defaults to 3600, so every hour. The documentation doesn't say anything about this but if I set this to 0 would key expiry be disabled? Instead I'll set it to something very high and see if it staves off the symptoms. If it does it would indicate that key expiry is at fault here.

jakubgs commented 6 years ago

I'm looking at the code and I don't get why they would all synchronize like that when sending the new keys: https://github.com/gsliepen/tinc/blob/master/src/net.c#L507 Although to be fair, I was testing the 1.1pre15 version on many hosts, maybe that's differently implemented.

jakubgs commented 6 years ago

Nah, went back on all to 1.0.33 (Ubuntu 18.04) and it still happens.

jakubgs commented 6 years ago

Seems like KEY_CHANGED thing was a red herring, I've tried using:

KeyExpire = 604800

On all hosts, but the CPU issue still happened after a while.

jakubgs commented 6 years ago

This is how it looks like to me: tinc_cpu_issue tinc_mem_issue tinc_net_issue tinc_softirq_issue

jakubgs commented 6 years ago

It seems to be doing an awful lot of writing:

1603047.059 ( 0.078 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) ...
1603047.137 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1dee340, count: 86          ) = 86
LOST 1 events!
1603047.153 ( 0.116 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) ...
1603047.269 ( 0.009 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1df0450, count: 79          ) = 79
1603047.297 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603047.357 ( 0.006 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1dee370, count: 115         ) = 115
LOST 4 events!
1603047.376 ( 0.062 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603047.567 ( 0.008 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1df0450, count: 79          ) = 79
1603047.591 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603047.650 ( 0.006 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1dee370, count: 115         ) = 115
1603047.672 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
LOST 4 events!
1603047.857 ( 0.008 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1df0450, count: 79          ) = 79
1603047.881 ( 0.006 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603048.012 ( 0.006 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1dee370, count: 115         ) = 115
1603048.034 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603048.090 ( 0.006 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1dee340, count: 86          ) = 86
LOST 2 events!
1603048.251 ( 0.010 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1df0450, count: 79          ) = 79
1603048.279 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603048.341 ( 0.006 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1dee370, count: 115         ) = 115
LOST 4 events!
1603048.419 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603048.555 ( 0.007 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1df0450, count: 79          ) = 79
1603048.579 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603048.650 ( 0.006 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1df0450, count: 79          ) = 79
1603048.670 ( 0.005 ms): write(fd: 2<socket:[960674]>, buf: 0x7f89a661e703, count: 1           ) = 1
1603048.726 ( 0.007 ms): write(fd: 2<socket:[960674]>, buf: 0x7ffdc1dee370, count: 115         ) = 115

jakubgs commented 6 years ago

I think I've found the secret sauce. I'm currently using the following options:

Forwarding = off
DirectOnly = yes
TunnelServer = yes

And I haven't seen the CPU spike so far.

tomberek commented 6 years ago

What is your theory of why those options help?

jakubgs commented 6 years ago

It's dark magic and i have no idea. My vague hypothesis is that the CPU issues coincided with a lot of Metadata socket read error for, a metric shitload of Sending X bytes of metadata, and a sprinkling of Forwarding ADD_EDGE from, which suggests to me that this issue is related to internal(meta) Tinc traffic, rather than actual traffic generated by the hosts. Based on that I've decided to use the 3 options because it seems to me like they would reduce meta traffic, especially across multiple nodes forwarding them(I was suspecting a loop):

Forwarding = off - Incoming packets that are not meant for the local node, but which should be forwarded to another node, are dropped.
DirectOnly = yes - When this option is enabled, packets that cannot be sent directly to the destination node, but which would have to be forwarded by an intermediate node, are dropped instead.  When combined with the IndirectData option, packets for nodes for which we do not have a meta connection with are also dropped.
TunnelServer = yes - When this option is enabled tinc will no longer forward information between other tinc daemons, and will only allow connections with nodes for which host config files are present in the local /etc/tinc/NETNAME/hosts/ directory. Setting this options also implicitly sets StrictSubnets.

I think one of those or some of them or all of them have stopped some kind of metadata loop storm.

gsliepen commented 5 years ago

Were you already using TunnelServer = yes on one or more of the nodes when this issue first occured?

jakubgs commented 5 years ago

I'm pretty sure I was not. but I can check later in my git history to verify.

mark-stopka commented 5 years ago

@jakubgs can you check the repository? @gsliepen, what is your theory on the issue? I am planning similarly large deployment and would like to know how likely is it to occur, and what would take to remedy this? Were you able to reproduce the bug?

jakubgs commented 5 years ago

Were you already using TunnelServer = yes on one or more of the nodes when this issue first occured?

No, adding it is what fixed it.

fangfufu commented 3 years ago

This issue needs more investigation, although from https://github.com/gsliepen/tinc/issues/202#issuecomment-867195496, jakubgs no longer uses tinc.

st31ny commented 2 years ago

I am also observing this issue in my network: Sometimes there is very high physical traffic without much virtual traffic, high CPU load by tinc and a very high latency.

I tried to set TunnelServer = yes but now hosts that cannot setup a direct connection cannot communicate with each other, although all involved nodes have all host information available now in /etc/tinc/NETNAME/hosts. Only nodes of which one has a ConnectTo statement to the other can communicate over tinc.

Is this an expected effect of TunnelServer = yes? If I understand the manpage correctly, this option only disables the forwarding of meta data but still allows forwarding of packets not meant for the local host. Is a ConnectTo statement required to communicate with a host if TunnelServer = yes?

st31ny commented 2 years ago

Is there any way I can assist in identifying the root issue of this?

Right now with ~500 nodes I have to restart the entire network roughly once a week (usually around 4 am), and I really do fear that things could get worse if I add more nodes in the future.

ptorrent commented 2 years ago

Do you have high CPU/MEM usage or just "Connection reset by peer" ?

st31ny commented 2 years ago

Do you have high CPU/MEM usage or just "Connection reset by peer" ?

I also have a very high CPU usage and exploding ping times between hosts (normally, servers have a ping of 3 ms or less; during such events this goes up to many seconds). I didn't monitor the memory usage though.

ptorrent commented 2 years ago

We found a solution for "Connection reset by peer"... I don't think it's the best solution but it's works

tcpdump -n' -i eth0 'tcp port 655 and tcp[tcpflags] == tcp-syn'

That command will print all new connection from a host on port 655. If there to many connection from a host we just block the ip address with iptable itpables -A INPUT -s ..... -p tcp,udp -dport 655 -REJECT for 30 minutes...

Maybe that can help you ?

st31ny commented 2 years ago

That command will print all new connection from a host on port 655. If there to many connection from a host we just block the ip address with iptable itpables -A INPUT -s ..... -p tcp,udp -dport 655 -REJECT for 30 minutes...

Oh, sounds like this is worth a try! May I ask how you automated this? With a custom python script?

ptorrent commented 2 years ago

we're using nodejs but you can do it in python. You need to flush tcpdump output so the real command is:

stdbuf -i0 -o0 -e0  tcpdump  -n -i eth0 'tcp port 655 and tcp[tcpflags] == tcp-syn'

Then on each line you will have (split by \n) you can get remote address. The line will looks like:

17:58:44.476135 IP [REMOTE ADDRESS] > [YOUR PUBLIC ADDRESS].655: Flags [S], seq 858164035, win 29200, options [mss 1386,sackOK,TS val 1168206 ecr 0,nop,wscale 7], length 0

Then you creapte the iptables rules

 iptables -A INPUT -i eth0 -s  [REMOTE ADDRESS]  -p tcp --dport 655 -j REJECT

Hope it can help

st31ny commented 2 years ago

Awesome, thank you very much! I'll definitely try this.

ptorrent commented 2 years ago

don't forget to remove the rule after some times ^^

st31ny commented 2 years ago

don't forget to remove the rule after some times ^^

Haha, true. This would make some ppl angry otherwise.

st31ny commented 2 years ago

Just for reference, I am now using an iptables rule like that:

iptables -A INPUT -p tcp -m tcp --dport tinc -m hashlimit --hashlimit-name tinc_tcp --hashlimit-mode srcip --hashlimit-upto 5/min --hashlimit-burst 2 -j ACCEPT

This will only accept 5 requests per minute per source IP with a burst of 2. Until now, it seems to work.

gsliepen / tinc

Tinc self-DDOS #195