travisjeffery / jocko

Kafka implemented in Golang with built-in coordination (No ZK dep, single binary install, Cloud Native)
https://twitter.com/travisjeffery
MIT License
4.94k stars 363 forks source link

Bootstrapping improvements #138

Open candlerb opened 6 years ago

candlerb commented 6 years ago

According to the consul bootstrapping documentation, I was expecting to be able to safely bootstrap a 3-node cluster by specifying only --bootstrap-expect=3 on all three nodes, without specifying --bootstrap on the first node. This ought to be safer:

Manual bootstrapping is not recommended as it is more error-prone than automatic bootstrapping with -bootstrap-expect

However without --bootstrap it doesn't come up: the first node loses its leadership.

First node shows:

2018/06/22 09:26:58 [WARN] raft: no known peers, aborting election
2018/06/22 09:27:05 [DEBUG] memberlist: Stream connection from=127.0.0.1:34296
2018/06/22 09:27:05 [INFO] serf: EventMemberJoin: jocko1 127.0.0.1
2018-06-22T09:27:05.321Z        INFO    jocko/serf.go:74        adding LAN server       {"id": 1, "broker addr"
: "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raft addr": "127.0.
0.1:9002", "meta": {"ID":2,"Name":"","Bootstrap":false,"Expect":3,"NonVoter":false,"Status":1,"RaftAddr":"127.0
.0.1:9102","SerfLANAddr":"127.0.0.1:9103","BrokerAddr":"127.0.0.1:9101"}}
2018-06-22T09:27:05.321Z        DEBUG   jocko/serf.go:149       maybe bootstrap: need more brokers      {"id":
1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raf
t addr": "127.0.0.1:9002", "brokers": 2, "bootstrap expect": 3}
2018/06/22 09:27:05 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 09:27:05 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 09:27:05 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 09:27:06 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 09:27:14 [DEBUG] memberlist: Stream connection from=127.0.0.1:34298
2018/06/22 09:27:14 [INFO] serf: EventMemberJoin: node3 127.0.0.1
2018-06-22T09:27:14.794Z        INFO    jocko/serf.go:74        adding LAN server       {"id": 1, "broker addr"
: "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raft addr": "127.0.
0.1:9002", "meta": {"ID":3,"Name":"","Bootstrap":false,"Expect":3,"NonVoter":false,"Status":1,"RaftAddr":"127.0
.0.1:9202","SerfLANAddr":"127.0.0.1:9203","BrokerAddr":"127.0.0.1:9201"}}
2018-06-22T09:27:14.794Z        INFO    jocko/serf.go:165       found expected number of peers, attempting boot
strap   {"id": 1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002"
, "id": 1, "raft addr": "127.0.0.1:9002", "addrs": ["127.0.0.1:9002", "127.0.0.1:9102", "127.0.0.1:9202"]}
2018/06/22 09:27:14 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [WARN] raft: Heartbeat timeout from "" reached, starting election
2018/06/22 09:27:15 [INFO] raft: Node at 127.0.0.1:9002 [Candidate] entering Candidate state in term 2
2018/06/22 09:27:15 [DEBUG] raft: Votes needed: 2
2018/06/22 09:27:15 [DEBUG] raft-net: 127.0.0.1:9002 accepted connection from: 127.0.0.1:35588
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] raft: Vote granted from 1 in term 2. Tally: 1
2018/06/22 09:27:15 [DEBUG] raft: Vote granted from 3 in term 2. Tally: 2
2018/06/22 09:27:15 [INFO] raft: Election won. Tally: 2
2018/06/22 09:27:15 [INFO] raft: Node at 127.0.0.1:9002 [Leader] entering Leader state
2018/06/22 09:27:15 [INFO] raft: Added peer 1, starting replication
2018/06/22 09:27:15 [INFO] raft: Added peer 2, starting replication
2018/06/22 09:27:15 [INFO] raft: Added peer 3, starting replication
2018-06-22T09:27:15.232Z        INFO    jocko/leader.go:136     leader: cluster leadership acquired     {"id": 1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raft addr": "127.0.0.1:9002"}
2018/06/22 09:27:15 [INFO] raft: pipelining replication to peer {Voter 3 127.0.0.1:9202}
2018/06/22 09:27:15 [INFO] raft: pipelining replication to peer {Voter 2 127.0.0.1:9102}
2018/06/22 09:27:15 [INFO] raft: Removed ourself, shutting down
2018-06-22T09:27:15.278Z        DEBUG   jocko/leader.go:143     leader: shutting down leader loop       {"id": 1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raft addr": "127.0.0.1:9002"}
2018-06-22T09:27:15.278Z        ERROR   jocko/leader.go:175     leader: failed to wait for barrier      {"id": 1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raft addr": "127.0.0.1:9002", "error": "leadership lost while committing log"}
github.com/travisjeffery/jocko/log.(*logger).Error
        /root/go/src/github.com/travisjeffery/jocko/log/logger.go:38
github.com/travisjeffery/jocko/jocko.(*Broker).leaderLoop
        /root/go/src/github.com/travisjeffery/jocko/jocko/leader.go:175
github.com/travisjeffery/jocko/jocko.(*Broker).monitorLeadership.func1
        /root/go/src/github.com/travisjeffery/jocko/jocko/leader.go:134
2018/06/22 09:27:15 [INFO] raft: aborting pipeline replication to peer {Voter 2 127.0.0.1:9102}
2018/06/22 09:27:15 [INFO] raft: aborting pipeline replication to peer {Voter 3 127.0.0.1:9202}
2018-06-22T09:27:15.279Z        INFO    jocko/leader.go:147     leader: cluster leadership lost {"id": 1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raft addr": "127.0.0.1:9002"}
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] raft-net: 127.0.0.1:9002 accepted connection from: 127.0.0.1:35596
2018/06/22 09:27:18 [ERR] serf: Rejected coordinate from node3: round trip time not in valid range, duration -163.169µs is not a positive value less than 10s
2018/06/22 09:27:22 [ERR] serf: Rejected coordinate from jocko1: round trip time not in valid range, duration -44.093µs is not a positive value less than 10s
2018/06/22 09:27:25 [ERR] raft: Failed to AppendEntries to {Voter 1 127.0.0.1:9002}: read tcp 127.0.0.1:35588->127.0.0.1:9002: i/o timeout
2018/06/22 09:27:25 [ERR] raft: Failed to heartbeat to 127.0.0.1:9002: read tcp 127.0.0.1:35596->127.0.0.1:9002: i/o timeout
2018/06/22 09:27:29 [ERR] serf: Rejected coordinate from jocko1: round trip time not in valid range, duration -151.162µs is not a positive value less than 10s
2018/06/22 09:27:36 [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:9103
2018/06/22 09:27:39 [ERR] serf: Rejected coordinate from jocko1: round trip time not in valid range, duration -1.65513ms is not a positive value less than 10s
2018/06/22 09:27:41 Reporting span 7ffaac6cfb5cee65:5858e871d897af09:7ffaac6cfb5cee65:1
2018/06/22 09:27:41 Reporting span 7ffaac6cfb5cee65:9d1520392bf4e62:7ffaac6cfb5cee65:1
2018/06/22 09:27:41 Reporting span 7ffaac6cfb5cee65:3b99e671ce3a640f:7ffaac6cfb5cee65:1
2018/06/22 09:27:41 Reporting span 7ffaac6cfb5cee65:2c0c4e2893b165cc:7ffaac6cfb5cee65:1
2018/06/22 09:27:41 Reporting span 7ffaac6cfb5cee65:7bf0f2d3eef9eb0:7ffaac6cfb5cee65:1
2018/06/22 09:27:41 Reporting span 7ffaac6cfb5cee65:7ffaac6cfb5cee65:0:1
2018/06/22 09:27:52 [DEBUG] memberlist: Stream connection from=127.0.0.1:34322
2018/06/22 09:27:53 [ERR] serf: Rejected coordinate from node3: round trip time not in valid range, duration -113.632µs is not a positive value less than 10s
2018/06/22 09:28:06 [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:9203
2018/06/22 09:28:07 [ERR] serf: Rejected coordinate from node3: round trip time not in valid range, duration -91.874µs is not a positive value less than 10s
2018/06/22 09:28:09 [DEBUG] memberlist: Stream connection from=127.0.0.1:34330

Second node says:

2018-06-22T09:27:05.275Z        INFO    jocko/broker.go:109     hello   {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "id": 2, "raft addr": "127.0.0.1:9102"}
2018/06/22 09:27:05 [INFO] raft: Initial configuration (index=0): []
2018/06/22 09:27:05 [INFO] raft: Node at 127.0.0.1:9102 [Follower] entering Follower state (Leader: "")
2018/06/22 09:27:05 [INFO] serf: EventMemberJoin: jocko1 127.0.0.1
2018/06/22 09:27:05 [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:9003
2018-06-22T09:27:05.319Z        INFO    jocko/serf.go:74        adding LAN server       {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "id": 2, "raft addr": "127.0.0.1:9102", "meta": {"ID":2,"Name":"","Bootstrap":false,"Expect":3,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9102","SerfLANAddr":"127.0.0.1:9103","BrokerAddr":"127.0.0.1:9101"}}
2018-06-22T09:27:05.319Z        DEBUG   jocko/serf.go:149       maybe bootstrap: need more brokers      {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "id": 2, "raft addr": "127.0.0.1:9102", "brokers": 1, "bootstrap expect": 3}
2018/06/22 09:27:05 [INFO] serf: EventMemberJoin: jocko0 127.0.0.1
2018-06-22T09:27:05.321Z        INFO    jocko/serf.go:74        adding LAN server       {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "id": 2, "raft addr": "127.0.0.1:9102", "meta": {"ID":1,"Name":"","Bootstrap":false,"Expect":3,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9002","SerfLANAddr":"127.0.0.1:9003","BrokerAddr":"127.0.0.1:9001"}}
2018-06-22T09:27:05.322Z        DEBUG   jocko/serf.go:149       maybe bootstrap: need more brokers      {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "id": 2, "raft addr": "127.0.0.1:9102", "brokers": 2, "bootstrap expect": 3}
2018-06-22T09:27:05.322Z        INFO    jocko/server.go:71      hello   {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "server id": 2, "addr": "127.0.0.1:9101"}
2018/06/22 09:27:05 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 09:27:05 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 09:27:06 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 09:27:06 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 09:27:06 [WARN] raft: no known peers, aborting election
2018/06/22 09:27:12 [ERR] serf: Rejected coordinate from jocko0: round trip time not in valid range, duration -149.077µs is not a positive value less than 10s
2018/06/22 09:27:14 [INFO] serf: EventMemberJoin: node3 127.0.0.1
2018-06-22T09:27:14.868Z        INFO    jocko/serf.go:74        adding LAN server       {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "id": 2, "raft addr": "127.0.0.1:9102", "meta": {"ID":3,"Name":"","Bootstrap":false,"Expect":3,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9202","SerfLANAddr":"127.0.0.1:9203","BrokerAddr":"127.0.0.1:9201"}}
2018-06-22T09:27:14.868Z        INFO    jocko/serf.go:165       found expected number of peers, attempting bootstrap   {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "id": 2, "raft addr": "127.0.0.1:9102", "addrs": ["127.0.0.1:9202", "127.0.0.1:9102", "127.0.0.1:9002"]}
2018/06/22 09:27:14 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] raft-net: 127.0.0.1:9102 accepted connection from: 127.0.0.1:40722
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] raft-net: 127.0.0.1:9102 accepted connection from: 127.0.0.1:40724
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:36 [DEBUG] memberlist: Stream connection from=127.0.0.1:54524
2018/06/22 09:27:43 Reporting span 6b158e4e74f2e7b8:6a8775256a497d5e:6b158e4e74f2e7b8:1
2018/06/22 09:27:43 Reporting span 6b158e4e74f2e7b8:cc4b4c9007f8b05:6b158e4e74f2e7b8:1
2018/06/22 09:27:43 Reporting span 6b158e4e74f2e7b8:49d7f1a7ae7c9411:6b158e4e74f2e7b8:1
2018/06/22 09:27:43 Reporting span 6b158e4e74f2e7b8:4b3fe84e65c405ce:6b158e4e74f2e7b8:1
2018/06/22 09:27:43 Reporting span 6b158e4e74f2e7b8:69e2af3aa130bede:6b158e4e74f2e7b8:1
2018/06/22 09:27:43 Reporting span 6b158e4e74f2e7b8:6b158e4e74f2e7b8:0:1
2018/06/22 09:27:52 [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:9003
2018/06/22 09:27:55 [ERR] serf: Rejected coordinate from jocko0: round trip time not in valid range, duration -69.532µs is not a positive value less than 10s

Third node says

2018-06-22T09:27:14.750Z        INFO    jocko/broker.go:109     hello   {"id": 3, "broker addr": "127.0.0.1:920
1", "serf addr": "127.0.0.1:9203", "raft addr": "127.0.0.1:9202", "id": 3, "raft addr": "127.0.0.1:9202"}
2018/06/22 09:27:14 [INFO] raft: Initial configuration (index=0): []
2018/06/22 09:27:14 [INFO] raft: Node at 127.0.0.1:9202 [Follower] entering Follower state (Leader: "")
2018/06/22 09:27:14 [INFO] serf: EventMemberJoin: node3 127.0.0.1
2018/06/22 09:27:14 [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:9003
2018-06-22T09:27:14.792Z        INFO    jocko/serf.go:74        adding LAN server       {"id": 3, "broker addr"
: "127.0.0.1:9201", "serf addr": "127.0.0.1:9203", "raft addr": "127.0.0.1:9202", "id": 3, "raft addr": "127.0.0.1:9202", "meta": {"ID":3,"Name":"","Bootstrap":false,"Expect":3,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9202","SerfLANAddr":"127.0.0.1:9203","BrokerAddr":"127.0.0.1:9201"}}
2018-06-22T09:27:14.793Z        DEBUG   jocko/serf.go:149       maybe bootstrap: need more brokers      {"id": 3, "broker addr": "127.0.0.1:9201", "serf addr": "127.0.0.1:9203", "raft addr": "127.0.0.1:9202", "id": 3, "raft addr": "127.0.0.1:9202", "brokers": 1, "bootstrap expect": 3}
2018/06/22 09:27:14 [INFO] serf: EventMemberJoin: jocko0 127.0.0.1
2018/06/22 09:27:14 [INFO] serf: EventMemberJoin: jocko1 127.0.0.1
2018-06-22T09:27:14.795Z        INFO    jocko/server.go:71      hello   {"id": 3, "broker addr": "127.0.0.1:9201", "serf addr": "127.0.0.1:9203", "raft addr": "127.0.0.1:9202", "server id": 3, "addr": "127.0.0.1:9201"}
2018-06-22T09:27:14.796Z        INFO    jocko/serf.go:74        adding LAN server       {"id": 3, "broker addr": "127.0.0.1:9201", "serf addr": "127.0.0.1:9203", "raft addr": "127.0.0.1:9202", "id": 3, "raft addr": "127.0.0.1:9202", "meta": {"ID":1,"Name":"","Bootstrap":false,"Expect":3,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9002","SerfLANAddr":"127.0.0.1:9003","BrokerAddr":"127.0.0.1:9001"}}
2018-06-22T09:27:14.796Z        INFO    jocko/serf.go:165       found expected number of peers, attempting bootstrap   {"id": 3, "broker addr": "127.0.0.1:9201", "serf addr": "127.0.0.1:9203", "raft addr": "127.0.0.1:9202", "id": 3, "raft addr": "127.0.0.1:9202", "addrs": ["127.0.0.1:9002", "127.0.0.1:9102", "127.0.0.1:9202"]}
2018-06-22T09:27:14.850Z        INFO    jocko/serf.go:74        adding LAN server       {"id": 3, "broker addr": "127.0.0.1:9201", "serf addr": "127.0.0.1:9203", "raft addr": "127.0.0.1:9202", "id": 3, "raft addr": "127.0.0.1:9202", "meta": {"ID":2,"Name":"","Bootstrap":false,"Expect":3,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9102","SerfLANAddr":"127.0.0.1:9103","BrokerAddr":"127.0.0.1:9101"}}
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] raft-net: 127.0.0.1:9202 accepted connection from: 127.0.0.1:55074
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:15 [DEBUG] serf: messageJoinType: node3
2018/06/22 09:27:16 [WARN] raft: not part of stable configuration, aborting election
2018/06/22 09:27:16 [ERR] serf: Rejected coordinate from jocko1: round trip time not in valid range, duration -1.748µs is not a positive value less than 10s
2018/06/22 09:27:20 [ERR] serf: Rejected coordinate from jocko1: round trip time not in valid range, duration -72.135µs is not a positive value less than 10s