Closed laszbalo closed 5 years ago
UPDATE:
Just tried running the Liftbridge cluster on different ports than 9292, 9293, 9294, and got a different error when calling client.Subscribe()
:
panic: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 0.0.0.0:9292: connect: connection refused"
The stream got created and replicated successfully just as before.
Previously I wrote, that the client tries to subscribe to the first address on the list. But it might have tried to connect to 0.0.0.0:9292
, which in my case, was the same as dev.lan:9292
, which address happened to be the first one on the list.
UPDATE:
It seems that the IP address which client.Subscribe()
is trying to use comes from the value of the listen
property of the liftbridge.conf
file. At least, when I changed it from 0.0.0.0
to localhost
, client.Subscribe()
tried to connect to the IP of my machine. Also, when I skipped it entirely, just to rely on the default value, client.Subscribe()
tried to connect :9292
.
This sounds like unexpected behavior. What Subscribe
should do is check the client's local metadata cache to see if it knows the broker address for the stream leader. If it doesn't have it, it will fetch the metadata. If it does have it, and it sends the request but the broker is no longer the leader, it should refresh the metadata and retry.
Basically, the address Subscribe
uses comes from the metadata which is fetched from the cluster and refreshed/retried on failures.
I will see if I can reproduce your issue.
Thanks for your detailed explanation.
Based on what you just described, it seems that Subscribe
is trying to connect to my remote servers using the syntax for announcing local network addresses.
By default Docker containers have IP addresses from the 172.17.0.x/16
range. But my client code is running on the 192.168.0.x
IP range.
I think the issue is that the metadata returned by the cluster uses the host and port as specified in the server config file for each broker:
https://github.com/liftbridge-io/liftbridge/blob/cd28b8e6e731c1706af7f0e8551fcf4101b58381/server/metadata.go#L181-L182 https://github.com/liftbridge-io/liftbridge/blob/cd28b8e6e731c1706af7f0e8551fcf4101b58381/server/server.go#L613-L614
I think this issue should be addressed with #98 which separates the listen address (what the server binds to) and host/port (what the server advertises to clients through the metadata API). You just need to make sure host/port is set to the actual external-facing address you want to connect to. Closing for now, but feel free to re-open if the issue isn't resolved.
I am trying to run the basic usage example code from the go-liftbridge. However, if the
stream leader
of a newly created stream is not the first one in the address list then I got the following error when callingclient.Subscribe()
:Is this a bug or did I made a mistake while setting up my cluster?
In depth:
In my network,
dev.lan
resolves to the IP of my development PC. I have a three-node Liftbridge cluster.Starting the Nats server:
My
liftbridge.conf
file:The peers of the Liftbridge cluster are running inside Docker containers. I built the Docker image from the latest version of Liftbridge, 261249172984c54af4079588866a370469150764 with the following command:
Starting the first peer:
Second peer, also this is the (Raft) leader:
Third peer:
go-liftbridge's version from go.mod:
I only changed the
addrs
variable from the example code, which now looks as follows:When I run the example I got the error above, and the following messages got logged by Liftbridge: