Open candlerb opened 6 years ago
Just tried it again. Unfortunately I cannot get either a single-node or multi-node setup running.
# ./jocko broker
2018/06/16 21:06:36 Initializing logging reporter
2018-06-16T21:06:36.095Z INFO jocko/broker.go:109 hello {"id": 0, "broker addr": "0.0.0.0:9092", "serf addr": "0.0.0.0:9094", "raft addr": "127.0.0.1:9093", "id": 0, "raft addr": "127.0.0.1:9093"}
2018/06/16 21:06:36 [INFO] raft: Initial configuration (index=0): []
2018/06/16 21:06:36 [INFO] raft: Node at 127.0.0.1:9093 [Follower] entering Follower state (Leader: "")
2018/06/16 21:06:36 [INFO] serf: EventMemberJoin: jocko ::
2018-06-16T21:06:36.138Z INFO jocko/server.go:71 hello {"id": 0, "broker addr": "0.0.0.0:9092", "serf addr": "0.0.0.0:9094", "raft addr": "127.0.0.1:9093", "server id": 0, "addr": "0.0.0.0:9092"}
2018-06-16T21:06:36.139Z INFO jocko/serf.go:74 adding LAN server {"id": 0, "broker addr": "0.0.0.0:9092", "serf addr": "0.0.0.0:9094", "raft addr": "127.0.0.1:9093", "id": 0, "raft addr": "127.0.0.1:9093", "meta": {"ID":0,"Name":"","Bootstrap":false,"Expect":0,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9093","SerfLANAddr":"0.0.0.0:9094:8301","BrokerAddr":"0.0.0.0:9092"}}
2018/06/16 21:06:37 [WARN] raft: no known peers, aborting election
I spy something dubious there: "SerfLANAddr":"0.0.0.0:9094:8301"
(8301 exists in the source code as DefaultLANSerfPort
)
In another screen I try to create a topic:
# ./jocko topic create --topic test
error code: not controller
Back in the broker screen I see:
2018/06/16 21:06:56 Reporting span 58a4a97d0a5b8bfe:152681c6a74b2a88:58a4a97d0a5b8bfe:1
2018/06/16 21:06:56 Reporting span 58a4a97d0a5b8bfe:29f53d9eec662474:58a4a97d0a5b8bfe:1
2018/06/16 21:06:56 Reporting span 58a4a97d0a5b8bfe:3ab651b5a79db080:58a4a97d0a5b8bfe:1
2018/06/16 21:06:56 Reporting span 58a4a97d0a5b8bfe:6215d9d3872fa0c8:58a4a97d0a5b8bfe:1
2018/06/16 21:06:56 Reporting span 58a4a97d0a5b8bfe:6cd74c36fd162c6:58a4a97d0a5b8bfe:1
2018/06/16 21:06:56 Reporting span 58a4a97d0a5b8bfe:58a4a97d0a5b8bfe:0:1
2018/06/16 21:07:12 Reporting span 8304d1cf33b493f:48fb487a39f54e3:8304d1cf33b493f:1
2018/06/16 21:07:12 Reporting span 8304d1cf33b493f:3119779192068843:8304d1cf33b493f:1
2018/06/16 21:07:12 Reporting span 8304d1cf33b493f:14c0ac963bda4e32:8304d1cf33b493f:1
2018/06/16 21:07:12 Reporting span 8304d1cf33b493f:61ef62b06b1e53f3:8304d1cf33b493f:1
2018/06/16 21:07:12 Reporting span 8304d1cf33b493f:41950f95b4c41d52:8304d1cf33b493f:1
2018/06/16 21:07:12 Reporting span 8304d1cf33b493f:8304d1cf33b493f:0:1
2018/06/16 21:07:13 ERROR: error when flushing the buffer: write udp 127.0.0.1:60767->127.0.0.1:6831: write: connection refused
I don't know what's supposed to be listening on port 6831; this number doesn't appear in the Jocko source code anywhere. And indeed nothing is listening on this port, although jocko has a connected UDP socket to send to 6831:
# netstat -naup
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 127.0.0.1:60767 127.0.0.1:6831 ESTABLISHED 15769/jocko
udp 0 0 0.0.0.0:68 0.0.0.0:* 256/dhclient
udp6 0 0 :::8301 :::* 15769/jocko
The instructions in _examples/cluster/README.md
have invalid flags. I changed them to:
# ./jocko broker \
--data-dir="/tmp/jocko0" \
--broker-addr=127.0.0.1:9001 \
--raft-addr=127.0.0.1:9002 \
--serf-addr=127.0.0.1:9003 \
--id=1
2018/06/16 21:23:41 Initializing logging reporter
2018-06-16T21:23:41.179Z INFO jocko/broker.go:109 hello {"id": 1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raft addr": "127.0.0.1:9002"}
2018/06/16 21:23:41 [INFO] raft: Initial configuration (index=0): []
2018/06/16 21:23:41 [INFO] raft: Node at 127.0.0.1:9002 [Follower] entering Follower state (Leader: "")
2018/06/16 21:23:41 [INFO] serf: EventMemberJoin: jocko ::
2018-06-16T21:23:41.223Z INFO jocko/server.go:71 hello {"id": 1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "server id": 1, "addr": "127.0.0.1:9001"}
2018-06-16T21:23:41.226Z INFO jocko/serf.go:74 adding LAN server {"id": 1, "broker addr": "127.0.0.1:9001", "serf addr": "127.0.0.1:9003", "raft addr": "127.0.0.1:9002", "id": 1, "raft addr": "127.0.0.1:9002", "meta": {"ID":1,"Name":"","Bootstrap":false,"Expect":0,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9002","SerfLANAddr":"127.0.0.1:9003:8301","BrokerAddr":"127.0.0.1:9001"}}
2018/06/16 21:23:43 [WARN] raft: no known peers, aborting election
(Note: same problem with SerfLANAddr having two ports)
In another screen, trying to add a second broker:
# ./jocko broker \
--data-dir="/tmp/jocko1" \
--broker-addr=127.0.0.1:9101 \
--raft-addr=127.0.0.1:9102 \
--serf-addr=127.0.0.1:9103 \
--join=127.0.0.1:9003 \
--id=2
2018/06/16 21:24:31 Initializing logging reporter
2018-06-16T21:24:31.488Z INFO jocko/broker.go:109 hello {"id": 2, "broker addr": "127.0.0.1:9101", "serf addr": "127.0.0.1:9103", "raft addr": "127.0.0.1:9102", "id": 2, "raft addr": "127.0.0.1:9102"}
2018/06/16 21:24:31 [INFO] raft: Initial configuration (index=0): []
2018/06/16 21:24:31 [INFO] raft: Node at 127.0.0.1:9102 [Follower] entering Follower state (Leader: "")
error starting broker: Failed to create memberlist: Could not set up network transport: Failed to start TCP listener on "127.0.0.1:9103" port 8301: listen tcp :8301: bind: address already in use
#
This one exits because it tries to bind to 8301; that port is already in use by the first process.
It seems that meta.SerfLANAddr
/ serf_lan_addr
is assembled from SerfLANConfig.MemberlistConfig.BindAddr
and SerfLANConfig.MemberlistConfig.BindPort
$ grep -R serf_lan_addr .
./jocko/leader.go: "serf_lan_addr": meta.SerfLANAddr,
./jocko/metadata/metadata.go: SerfLANAddr: m.Tags["serf_lan_addr"],
./jocko/serf.go: config.Tags["serf_lan_addr"] = fmt.Sprintf("%s:%d", b.config.SerfLANConfig.MemberlistConfig.BindAddr, b.config.SerfLANConfig.MemberlistConfig.BindPort)
However, BindAddr
defaults to both address and port:
./cmd/jocko/main.go: brokerCmd.Flags().StringVar(&brokerCfg.SerfLANConfig.MemberlistConfig.BindAddr, "serf-addr", "0.0.0.0:9094", "Address for Serf to bind on") // TODO: can set addr alone or need to set bind port separately?
And BindPort
defaults to 8301, and AFAICS cannot be overridden.
$ grep -R DefaultLANSerfPort .
./jocko/config/config.go: DefaultLANSerfPort = 8301
./jocko/config/config.go: conf.SerfLANConfig.MemberlistConfig.BindPort = DefaultLANSerfPort
... although in the test suite, it is set explicitly:
./jocko/testing.go: config.SerfLANConfig.MemberlistConfig.BindPort = ports[2]
./jocko/testing.go: s1.config.SerfLANConfig.MemberlistConfig.BindPort)
./testutil/testutil.go: config.SerfLANConfig.MemberlistConfig.BindPort = ports[1]
I can't see how this can possibly work outside the test suite.
What I can do is force --serf-addr=127.0.0.1
at which point at least we don't have duplicate ports in SerfLANAddr:
# ./jocko broker --serf-addr=127.0.0.1
2018/06/16 21:31:42 Initializing logging reporter
2018-06-16T21:31:42.136Z INFO jocko/broker.go:109 hello {"id": 0, "broker addr": "0.0.0.0:9092", "serf addr": "127.0.0.1", "raft addr": "127.0.0.1:9093", "id": 0, "raft addr": "127.0.0.1:9093"}
2018/06/16 21:31:42 [INFO] raft: Initial configuration (index=0): []
2018/06/16 21:31:42 [INFO] raft: Node at 127.0.0.1:9093 [Follower] entering Follower state (Leader: "")
2018/06/16 21:31:42 [INFO] serf: EventMemberJoin: jocko 127.0.0.1
2018-06-16T21:31:42.170Z INFO jocko/server.go:71 hello {"id": 0, "broker addr": "0.0.0.0:9092", "serf addr": "127.0.0.1", "raft addr": "127.0.0.1:9093", "server id": 0, "addr": "0.0.0.0:9092"}
2018-06-16T21:31:42.175Z INFO jocko/serf.go:74 adding LAN server {"id": 0, "broker addr": "0.0.0.0:9092", "serf addr": "127.0.0.1", "raft addr": "127.0.0.1:9093", "id": 0, "raft addr": "127.0.0.1:9093", "meta": {"ID":0,"Name":"","Bootstrap":false,"Expect":0,"NonVoter":false,"Status":1,"RaftAddr":"127.0.0.1:9093","SerfLANAddr":"127.0.0.1:8301","BrokerAddr":"0.0.0.0:9092"}}
2018/06/16 21:31:42 [WARN] serf: Failed to re-join any previously known node
2018/06/16 21:31:43 [WARN] raft: no known peers, aborting election
However it still fails in the same same way as single node cluster (client says error code: not controller
; broker fails writing to UDP port 6831)
P.S. Looking in the source code of serf itself, it uses a helper to split addr:port
into the separate components of MemberlistConfig
func (c *Command) setupAgent(config *Config, logOutput io.Writer) *Agent {
bindIP, bindPort, err := config.AddrParts(config.BindAddr)
...
serfConfig.MemberlistConfig.BindAddr = bindIP
serfConfig.MemberlistConfig.BindPort = bindPort
And I found port 6831 in jaeger-client-go. Since this is for OpenTracing, the failure to send to this UDP port may not matter. It would of course be nice to turn off when not needed.
./vendor/github.com/uber/jaeger-client-go/transport_udp.go:const defaultUDPSpanServerHostPort = "localhost:6831"
Working on fixes in #133 / #134
Thanks for working on this. I ran into the same issue with the ports conflicting.
Let me know if I can help with testing or code review.
Current status: you can start a one-node cluster with jocko broker --bootstrap --bootstrap-expect=1
, and create a topic with jocko topic create --topic <name>
.
When I try to publish a message with confluent-kafka-python, it fails with the following error:
%3|1529616101.104|PROTOERR|rdkafka#producer-1| [thrd:main]: localhost:9092/bootstrap: Protocol parse failure at 31/70 (rd_kafka_parse_Metadata:306) (incorrect broker.version.fallback?)
%3|1529616101.104|PROTOERR|rdkafka#producer-1| [thrd:main]: localhost:9092/bootstrap: 65536 topics: tmpabuf memory shortage
%4|1529616101.104|METADATA|rdkafka#producer-1| [thrd:main]: localhost:9092/bootstrap: Metadata request failed: connected: Local: Bad message format (1ms): Permanent
You can start multiple nodes with e.g. --bootstrap-expect=3
, but the cluster won't come up because the --join
option currently does nothing. (I still haven't worked out why jocko needs both raft and serf. Maybe it's to allow a cluster where only a subset of nodes store the raft commit log?)
Cluster startup now kind-of working: serf needs to have a unique node name, so I added a JOCKONODENAME
environment variable to override it. This is something which should rarely be used, so I didn't make it a command line flag.
There seems to be a problem with negative message transit times (!)
2018/06/22 08:04:21 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 08:04:21 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 08:04:21 [DEBUG] serf: messageJoinType: jocko1
2018/06/22 08:04:21 [ERR] serf: Rejected coordinate from jocko0: round trip time not in valid range, duration -7.035µs is not a positive value less than 10s
And the client still has to know which node to connect to:
# cmd/jocko/jocko topic create --topic weeble --broker-addr 127.0.0.1:9201
error code: not controller
# cmd/jocko/jocko topic create --topic weeble --broker-addr 127.0.0.1:9101
error code: not controller
# cmd/jocko/jocko topic create --topic weeble --broker-addr 127.0.0.1:9001
created topic: weeble
@candlerb thanks for the PRs, merged them. you need both serf and raft cause they do different things, serf does discovery and raft does consensus. right
By "discovery" do you mean discovery of which nodes are members of the raft cluster, to avoid having to statically configure peers? I wasn't sure that a gossip protocol was suitable for that.
UPDATE: I have moved this discussion to #140
After latest push on branch candlerb/serfaddr (pull request #136), metadata response now works. Next problem is when publishing to a topic:
2018-06-23T21:17:07.518Z ERROR jocko/broker.go:427 produce to partition failed {"id": 0, "broker addr": "0.0.0.0:9092", "serf addr": "0.0.0.0:9094", "raft addr": "127.0.0.1:9093", "id": 0, "raft addr": "127.0.0.1:9093", "error": "no replica for topic mytopic partition 0"}
github.com/travisjeffery/jocko/log.(*logger).Error
/root/go/src/github.com/travisjeffery/jocko/log/logger.go:38
github.com/travisjeffery/jocko/jocko.(*Broker).handleProduce
/root/go/src/github.com/travisjeffery/jocko/jocko/broker.go:427
github.com/travisjeffery/jocko/jocko.(*Broker).Run
/root/go/src/github.com/travisjeffery/jocko/jocko/broker.go:146
The cluster example uses command line flags which are no longer valid:
--debug
--log-dir
(should be--data-dir
?)--prometheus-addr
--serf-members
(should be--join
or--join-wan
?)So I tried running it like this:
These options are accepted; but no broker is listening on ports 9001, 9101 or 9201, nor is serf listening on 9003, 9103 or 9203.
Captured output:
I tried running the first process under strace. Here are all the lines matching
htons
:(I don't see any attempt to open ports 9001 or 9003?)
Here are the lines matching
= -1
:The EPERM issues are a bit worrying. Maybe this is a symptom of running within an lxd container (but then again, running in a docker container is supposed to work)