Open JChrist opened 4 years ago
I managed to write a (failing) test for this case (a ServerSocket
on 5701 is enough).
This is assumed to be inside TcpIpJoinTest
@Test
public void test_whenNotRespondingNodeExists() throws Exception {
Config config = new Config();
config.setProperty(ClusterProperty.WAIT_SECONDS_BEFORE_JOIN.getName(), "0");
config.setProperty(ClusterProperty.MAX_JOIN_SECONDS.getName(), "3");
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
config.getNetworkConfig().getInterfaces().setEnabled(false).clear();
config.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(true)
.setMembers(Collections.singletonList("127.0.0.1")).setConnectionTimeoutSeconds(1);
//start listening on 127.0.0.1:5701 without ever responding to any packet
final ServerSocket ss = new ServerSocket(5701);
final AtomicBoolean stop = new AtomicBoolean(false);
final Thread socketWatcher = new Thread(() -> {
while (!stop.get()) {
try {
ss.accept();
} catch (Exception ignored) {
}
}
});
socketWatcher.start();
try {
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
assertClusterSize(1, hz);
} finally {
stop.set(true);
socketWatcher.interrupt();
IOUtil.close(ss);
}
}
I seem to fix this issue by adding an else block in TcpServerConnectionManager:204
similar to the following.
Essentially, TcpServerConnectionManager checks that when a connection is requested that is marked as in-progress, it has received some bytes in the past.
else {
connections.stream().filter(c -> c.getRemoteAddress().equals(address)).findFirst().ifPresent(c -> {
final long maxReadWaitTimeMillis = 1_000L; // TODO: somehow make it configurable?
if (Clock.currentTimeMillis() - c.lastReadTimeMillis() > maxReadWaitTimeMillis) {
c.close("nothing received for the last: " + maxReadWaitTimeMillis + "ms", null);
//this will not be called when closing the connection due to null cause
//but null cause is needed to avoid reconnection mechanism kicking-in
serverContext.onFailedConnection(address);
}
});
}
Hi @JChrist ! Without going too much into the details here, I can say historically we had similar doubts whether we should fail-fast when something went wrong during the join process or whether the instance should start standalone. In some previous versions, an instance would start standalone while with this change it started to fail to start. And we said that's ok. I can say that, in some cases, it does make sense that an instance starts standalone but in others it might be confusing - users might inadvertently have a typo in the member list and have instances start standalone instead of forming a cluster.
So, added this to the current milestone to take a closer look but as it stands now, I'd say this is intended behaviour and we prefer fail-fast.
There should at least be some kind of configuration option to handle such behavior. Something like max milliseconds to wait for a join response from a cluster member before blacklisting it.
We are having similar issues when tunnelling Hazelcast through stunnel. A remote "connection refused" becomes "connection reset by peer", which is not forwarded to the manager, leading to the peer never being blacklisted (preventing server startup).
This issue is going a bit further by also detecting nodes that just don't send anything...
On top of that, why is the exception not being passed to the life-cycle method here: https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/java/com/hazelcast/internal/server/tcp/TcpServerConnection.java#L250
In fact, i haven't found a call to that lifecycle method passing anything but null as the exception...
We are having similar issues when tunnelling Hazelcast through stunnel. A remote "connection refused" becomes "connection reset by peer", which is not forwarded to the manager, leading to the peer never being blacklisted (preventing server startup).
This issue is going a bit further by also detecting nodes that just don't send anything...
On top of that, why is the exception not being passed to the life-cycle method here: https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/java/com/hazelcast/internal/server/tcp/TcpServerConnection.java#L250
In fact, i haven't found a call to that lifecycle method passing anything but null as the exception...
Hi, have you found a workaround for this? We have the same problem (Hazelcast+stunnel). However, it is difficult for us to intervene in the implementation as we use a ready-made software solution that itself uses Hazelcast.
Hi, have you found a workaround for this? We have the same problem (Hazelcast+stunnel). However, it is difficult for us to intervene in the implementation as we use a ready-made software solution that itself uses Hazelcast.
No. We chose to migrate off of Hazelcast instead.
I tested this with 5.2 just now and I'm still seeing the same behavior. This happens if something accepts the socket connection but does not respond. If there is nothing listening on the socket at all then of course the cluster doesn't wait. There is an ICMP based failure detector that may work for you: https://docs.hazelcast.com/hazelcast/5.2/clusters/ping-detector . Since it does not rely on sockets, it at least won't have this exact problem.
Describe the bug When trying to start a new hazelcast node with the existing cluster node(s) being unresponsive (e.g. sending absolutely no responses whatsoever), the new node will fail to start, instead of claiming mastership and starting.
Expected behavior The new node to succeed in claiming mastership and starting.
To Reproduce
Steps to reproduce the behavior:
nc -l 0.0.0.0 5701
(thank you @vbekiaris !!!)Additional context This is reproducible with both 4.0.1 and 3.12.7 I tried to get to the bottom of this, but I'm missing some parts. The most I managed to find out is that in class TcpIpJoiner (line 209), inside method
claimMastership(Collection<Address> possibleAddresses)
this operation fails due toTcpIpEndpointManager
always returning a nullTcpIpConnection
from thepublic TcpIpConnection getOrConnect(final Address address, final boolean silent)
method (line 193) since the connection will eternally be considered as in-progress (it will exist inside itsconnectionsInProgress
set). Also, the new node will not declare itself as the new master through TcpIpJoiner's (line 151clusterJoinManager.setThisMemberAsMaster();
because it will never blacklist the misbehaving node, since this is only done in case the socket is not connected.