hashicorp / memberlist

Golang package for gossip based membership and failure detection
Mozilla Public License 2.0
3.68k stars 443 forks source link

Failed UDP Ping #93

Open rogeralsing opened 8 years ago

rogeralsing commented 8 years ago

I have two nodes running locally on my machine. One with the default port for memberlist. and another that assigns the first free port.

I'm seeing this in the logs:

2016/11/11 21:54:50 [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:7946
2016/11/11 21:54:50 Member: seed_se-lpt-0002:7946 169.254.87.13
2016/11/11 21:54:50 Member: member_se-lpt-0002:28210 169.254.87.13
Running
2016/11/11 21:54:51 [DEBUG] memberlist: Failed UDP ping: seed_se-lpt-0002:7946 (timeout reached)
2016/11/11 21:54:52 [INFO] memberlist: Suspect seed_se-lpt-0002:7946 has failed, no acks received
2016/11/11 21:54:53 [DEBUG] memberlist: Failed UDP ping: seed_se-lpt-0002:7946 (timeout reached)
2016/11/11 21:54:55 [INFO] memberlist: Suspect seed_se-lpt-0002:7946 has failed, no acks received
2016/11/11 21:54:55 [INFO] memberlist: Marking seed_se-lpt-0002:7946 as failed, suspect timeout reached (0 peer confirmations)

Why are UDP pings failing even though both nodes run on the same machine? The nodes can clearly see eachother early on, but then drops contact due to missing acks.

ideas why?

abouteiller commented 8 years ago

Same issue on my side. It appears that the members Node.addr are set to 127.0.0.1 even though they are located on remote nodes. It could be because writeto does not propagate the correct "from" when the UDP socket is bound to 0.0.0.0.

I tried to change the bindaddr in the config, and that solved the issue. However, lots of codes out there use 0.0.0.0 for the bind and are broken depending on the kernel version. (Linux 4.8.1 here).