weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Peers connect failed since host clock skew exceeds 900s limit #3053

Closed jacknlliu closed 7 years ago

jacknlliu commented 7 years ago

Peers connect failed, the following output show the error.

$ weave status connections
-> 192.168.1.101:6783    failed      host clock skew of 33166s exceeds 900s limit, retry: 2017-07-09 16:59:26.882288888 +0000 UTC 

What happened?

Two hosts use weave to connect the docker containers.

On host A,

$ weave launch 192.168.1.200

On host B,

$ weave launch 192.168.1.101

and show the status

$ weave status

Then we get failed.

Version: git-d8222d76d957 (up to date; next check at 0001/01/01 00:00:00)

        Service: router
       Protocol: weave 1..2
           Name: 62:3a:88:f1:51:6e(arm)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 1
    Connections: 1 (1 failed)
          Peers: 1
 TrustedSubnets: none
$ weave version
2.0.1

$ docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:22:33 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:23:51 2017
 OS/Arch:      linux/amd64
 Experimental: false

$ uname -a
Linux  4.11.8-200.fc25.x86_64 #1 SMP Thu Jun 29 16:13:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
bboreham commented 7 years ago

Can you check if the two hosts have the same time?

marccarre commented 7 years ago

Some background on this error: the clock skew could be problematic for the gossip protocol Weave Net uses under the cover to exchange data between the various peers making your cluster. See also here and here.

If the two hosts have a different time (as asked by @bboreham above), then you may want to synchronise these using NTP.

jacknlliu commented 7 years ago

@bboreham @marccarre the two hosts don't have the same time, but I wonder if it will work with this issue. I test etcd, and it will work with this issue even though some warning print.

marccarre commented 7 years ago

Some people claimed etcd is resilient to skews, so yes, it could be that you will not experience issues on this specific front, however, in general,

so, IMHO, it definitely would be safer and preferable to synchronise these clocks, @jacknlliu. Is this something you have control over?


@bboreham, do you reckon this could be useful?

jacknlliu commented 7 years ago

@marccarre thank you very much for sharing your brilliant opinion.

bboreham commented 7 years ago

We use the time to remove "tombstones" which replace deleted DNS entries from the eventually-consistent data structure.

If two clocks are very out of sync then WeaveDNS would delete current data.

Without a mechanism to remove tombstone entries the data structure will grow bigger and bigger and consume more machine resources over time.

Etcd is completely different: it is a full-consensus system that will freeze under a network partition. Different trade-offs.

bboreham commented 7 years ago

Closing because this is working as designed.