Closed enst closed 7 years ago
seeing the same thing on 1.9.7
$ weave status
Version: 1.9.7 (up to date; next check at 2017/06/10 01:19:36)
Service: router
Protocol: weave 1..2
Name: 0e:9f:4f:49:d0:85(monitoring)
Encryption: enabled
PeerDiscovery: enabled Targets: 1 Connections: 6 (6 established) Peers: 7 (with 42 established connections) TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
Service: dns
Domain: internal.
Upstream: 8.8.8.8
TTL: 1
Entries: 69
Service: proxy
Address: unix:///var/run/weave/weave.sock
Service: plugin
DriverName: weave
The Weave network sets a default MTU which will work without fragmentation on all the clouds we've tried. If your underlying network can support bigger packets, you can raise this value.
See https://www.weave.works/docs/net/latest/using-weave/fastdp/
@enst, did changing the MTU as recommended by bboreham help?
Thanks, @bboreham and @marccarre. Finally, I got it work.
I tried issuing "weave reset" and "MTU=8192 weave launch..." on each of my nodes one by one. I was confused that big packets worked fine between some of them. Keep trying reset and re-launch didn't help.
Eventually, I launched a new clean weave network. Then disconnected old nodes from the old network and joined the new one. It works this time.
Thanks for your help.
But I don't think dropping big packets by default is a good idea. You see the connection is there but will be puzzled when you noticed some weird TCP retransmission and UDP packet drop.
It's not us that's dropping big packets; it's Linux.
That said, I tried it and it worked for me:
# uname -a
Linux brya-1 4.8.0-45-generic #48~16.04.1-Ubuntu SMP Fri Mar 24 12:46:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
# ping -c 2 -s 2000 10.32.0.9
PING 10.32.0.9 (10.32.0.9) 2000(2028) bytes of data.
2008 bytes from 10.32.0.9: icmp_seq=1 ttl=64 time=1.13 ms
2008 bytes from 10.32.0.9: icmp_seq=2 ttl=64 time=0.528 ms
--- 10.32.0.9 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.528/0.833/1.139/0.306 ms
(on the other host) Here you can see the encapsulated packets are split into two:
# tcpdump -n -i ens4 port 6784
15:06:53.600445 IP 10.128.0.2.56774 > 10.128.0.3.6784: UDP, length 1394
15:06:53.600497 IP 10.128.0.2.56774 > 10.128.0.3.6784: UDP, length 698
15:06:53.600635 IP 10.128.0.3.44764 > 10.128.0.2.6784: UDP, length 1394
15:06:53.600645 IP 10.128.0.3.44764 > 10.128.0.2.6784: UDP, length 698
15:06:54.601400 IP 10.128.0.2.56774 > 10.128.0.3.6784: UDP, length 1394
15:06:54.601445 IP 10.128.0.2.56774 > 10.128.0.3.6784: UDP, length 698
15:06:54.601556 IP 10.128.0.3.44764 > 10.128.0.2.6784: UDP, length 1394
15:06:54.601565 IP 10.128.0.3.44764 > 10.128.0.2.6784: UDP, length 698
and reassembled at the higher layer:
# tcpdump -n -i weave icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes
15:08:15.836697 IP 10.32.4.8 > 10.32.0.9: ICMP echo request, id 31031, seq 1, length 2008
15:08:15.836779 IP 10.32.0.9 > 10.32.4.8: ICMP echo reply, id 31031, seq 1, length 2008
15:08:16.836549 IP 10.32.4.8 > 10.32.0.9: ICMP echo request, id 31031, seq 2, length 2008
15:08:16.836618 IP 10.32.0.9 > 10.32.4.8: ICMP echo reply, id 31031, seq 2, length 2008
If your ping
command has a -M
option you can specify that it mustn't fragment, and it will get an error from Linux:
# ping -c 2 -M do -s 2000 10.32.0.9
PING 10.32.0.9 (10.32.0.9) 2000(2028) bytes of data.
ping: local error: Message too long, mtu=1376
ping: local error: Message too long, mtu=1376
--- 10.32.0.9 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1018ms
I agree it's not a bug. But to end user, it's confusing that big packets get dropped by default.
Linux doesn't drop big packets on eth0 interface. Weave didn't work properly until big MTU was set.
Maybe think about using big MTU as default? Just a suggestion.
This issue created some down time in our production env. I almost gave up using weave. Finally figured out the root cause (MTU) and thanks for your help.
As I say, I got entirely different results to you. If you could show a transcript of all commands, exactly as used, with the parameters, that lead to packets dropping, I might be able to see why it's different for you.
We cannot set a large MTU by default; that will definitely lead to packet drops when the underlying network cannot cope.
I don't have the previous environment anymore. I've listed the versions in my first post.
I was using default settings at that time. 1st node: weave launch --password ..... --dns-domain="internal." other nodes: weave launch --password ..... --dns-domain="internal." "1st node hostname" Some of my servers are on Digital Ocean, some are on Linode.
This time, I upgraded weave and put MTU=8192. Everything works fine now.
Hi,
I noticed small UDP packets all work fine. Some larger ones (more or less 1500) always got dropped. Looks like it's related to MTU.
Also tried ping, also failed with big size packets like "-s 2000". But everything works fine when I directly use eth0. Large UDP and ICMP both work fine. It happened when I try to reach another weave node.
Not sure what prevent weave from doing fragment.
Thanks,
PeerDiscovery: enabled Targets: 1 Connections: 6 (6 established) Peers: 7 (with 42 established connections) TrustedSubnets: none
DefaultSubnet: 10.32.0.0/12
$ docker version Client: Version: 17.05.0-ce API version: 1.29 Go version: go1.7.5 Git commit: 89658be Built: Thu May 4 22:10:54 2017 OS/Arch: linux/amd64
Server: Version: 17.05.0-ce API version: 1.29 (minimum version 1.12) Go version: go1.7.5 Git commit: 89658be Built: Thu May 4 22:10:54 2017 OS/Arch: linux/amd64 Experimental: false
Linux monitoring 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Edit: removed unused parts of the template