weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Large packet (UDP, ICMP) got dropped #3012

Closed enst closed 7 years ago

enst commented 7 years ago

Hi,

I noticed small UDP packets all work fine. Some larger ones (more or less 1500) always got dropped. Looks like it's related to MTU.

Also tried ping, also failed with big size packets like "-s 2000". But everything works fine when I directly use eth0. Large UDP and ICMP both work fine. It happened when I try to reach another weave node.

Not sure what prevent weave from doing fragment.

Thanks,

  Version: 1.8.2 (version 1.9.4 available - please upgrade!)

    Service: router
   Protocol: weave 1..2
       Name: 0e:9f:4f:49:d0:85(monitoring)
 Encryption: enabled

PeerDiscovery: enabled Targets: 1 Connections: 6 (6 established) Peers: 7 (with 42 established connections) TrustedSubnets: none

    Service: ipam
     Status: ready
      Range: 10.32.0.0/12

DefaultSubnet: 10.32.0.0/12

    Service: dns
     Domain: internal.
   Upstream: 8.8.8.8
        TTL: 1
    Entries: 69

    Service: proxy
    Address: unix:///var/run/weave/weave.sock

    Service: plugin
 DriverName: weave

$ docker version Client: Version: 17.05.0-ce API version: 1.29 Go version: go1.7.5 Git commit: 89658be Built: Thu May 4 22:10:54 2017 OS/Arch: linux/amd64

Server: Version: 17.05.0-ce API version: 1.29 (minimum version 1.12) Go version: go1.7.5 Git commit: 89658be Built: Thu May 4 22:10:54 2017 OS/Arch: linux/amd64 Experimental: false

Linux monitoring 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Edit: removed unused parts of the template

enst commented 7 years ago

seeing the same thing on 1.9.7

$ weave status

    Version: 1.9.7 (up to date; next check at 2017/06/10 01:19:36)

    Service: router
   Protocol: weave 1..2
       Name: 0e:9f:4f:49:d0:85(monitoring)
 Encryption: enabled

PeerDiscovery: enabled Targets: 1 Connections: 6 (6 established) Peers: 7 (with 42 established connections) TrustedSubnets: none

    Service: ipam
     Status: ready
      Range: 10.32.0.0/12

DefaultSubnet: 10.32.0.0/12

    Service: dns
     Domain: internal.
   Upstream: 8.8.8.8
        TTL: 1
    Entries: 69

    Service: proxy
    Address: unix:///var/run/weave/weave.sock

    Service: plugin
 DriverName: weave
bboreham commented 7 years ago

The Weave network sets a default MTU which will work without fragmentation on all the clouds we've tried. If your underlying network can support bigger packets, you can raise this value.

See https://www.weave.works/docs/net/latest/using-weave/fastdp/

marccarre commented 7 years ago

@enst, did changing the MTU as recommended by bboreham help?

enst commented 7 years ago

Thanks, @bboreham and @marccarre. Finally, I got it work.

I tried issuing "weave reset" and "MTU=8192 weave launch..." on each of my nodes one by one. I was confused that big packets worked fine between some of them. Keep trying reset and re-launch didn't help.

Eventually, I launched a new clean weave network. Then disconnected old nodes from the old network and joined the new one. It works this time.

Thanks for your help.

But I don't think dropping big packets by default is a good idea. You see the connection is there but will be puzzled when you noticed some weird TCP retransmission and UDP packet drop.

bboreham commented 7 years ago

It's not us that's dropping big packets; it's Linux.

That said, I tried it and it worked for me:

# uname -a
Linux brya-1 4.8.0-45-generic #48~16.04.1-Ubuntu SMP Fri Mar 24 12:46:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
# ping -c 2 -s 2000 10.32.0.9
PING 10.32.0.9 (10.32.0.9) 2000(2028) bytes of data.
2008 bytes from 10.32.0.9: icmp_seq=1 ttl=64 time=1.13 ms
2008 bytes from 10.32.0.9: icmp_seq=2 ttl=64 time=0.528 ms

--- 10.32.0.9 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.528/0.833/1.139/0.306 ms

(on the other host) Here you can see the encapsulated packets are split into two:

# tcpdump -n -i ens4 port 6784
15:06:53.600445 IP 10.128.0.2.56774 > 10.128.0.3.6784: UDP, length 1394
15:06:53.600497 IP 10.128.0.2.56774 > 10.128.0.3.6784: UDP, length 698
15:06:53.600635 IP 10.128.0.3.44764 > 10.128.0.2.6784: UDP, length 1394
15:06:53.600645 IP 10.128.0.3.44764 > 10.128.0.2.6784: UDP, length 698
15:06:54.601400 IP 10.128.0.2.56774 > 10.128.0.3.6784: UDP, length 1394
15:06:54.601445 IP 10.128.0.2.56774 > 10.128.0.3.6784: UDP, length 698
15:06:54.601556 IP 10.128.0.3.44764 > 10.128.0.2.6784: UDP, length 1394
15:06:54.601565 IP 10.128.0.3.44764 > 10.128.0.2.6784: UDP, length 698

and reassembled at the higher layer:

# tcpdump -n -i weave icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes
15:08:15.836697 IP 10.32.4.8 > 10.32.0.9: ICMP echo request, id 31031, seq 1, length 2008
15:08:15.836779 IP 10.32.0.9 > 10.32.4.8: ICMP echo reply, id 31031, seq 1, length 2008
15:08:16.836549 IP 10.32.4.8 > 10.32.0.9: ICMP echo request, id 31031, seq 2, length 2008
15:08:16.836618 IP 10.32.0.9 > 10.32.4.8: ICMP echo reply, id 31031, seq 2, length 2008

If your ping command has a -M option you can specify that it mustn't fragment, and it will get an error from Linux:

# ping -c 2 -M do -s 2000 10.32.0.9
PING 10.32.0.9 (10.32.0.9) 2000(2028) bytes of data.
ping: local error: Message too long, mtu=1376
ping: local error: Message too long, mtu=1376

--- 10.32.0.9 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1018ms
enst commented 7 years ago

I agree it's not a bug. But to end user, it's confusing that big packets get dropped by default.

Linux doesn't drop big packets on eth0 interface. Weave didn't work properly until big MTU was set.

Maybe think about using big MTU as default? Just a suggestion.

This issue created some down time in our production env. I almost gave up using weave. Finally figured out the root cause (MTU) and thanks for your help.

bboreham commented 7 years ago

As I say, I got entirely different results to you. If you could show a transcript of all commands, exactly as used, with the parameters, that lead to packets dropping, I might be able to see why it's different for you.

We cannot set a large MTU by default; that will definitely lead to packet drops when the underlying network cannot cope.

enst commented 7 years ago

I don't have the previous environment anymore. I've listed the versions in my first post.

I was using default settings at that time. 1st node: weave launch --password ..... --dns-domain="internal." other nodes: weave launch --password ..... --dns-domain="internal." "1st node hostname" Some of my servers are on Digital Ocean, some are on Linode.

This time, I upgraded weave and put MTU=8192. Everything works fine now.