Open HolyPrapor opened 3 years ago
According to SO the most proper way to check if a socket is connected is to check if there any bytes available to read and call Poll
method of the socket.
This PR resolves the issue with sockets.
Been having the same problem, org.apache.zookeeper.KeeperException.ConnectionLossException
started to appear more frequently in .netcore3, but when trying to upgrade to .NET5 I get it all the time.
Using MacOS and Big Sur.
I am afraid we also started facing the same issue, would be nice if PR is reviewed and released if it solves the issue.
We upgraded to .NET 6 from .NET Core 3.1 and see this very frequently now when running within a linux docker container.
Can the code be changed to follow the guidance from the MSFT docs for checking connected?
// .Connect throws an exception if unsuccessful
client.Connect(anEndPoint);
// This is how you can determine whether a socket is still connected.
bool blockingState = client.Blocking;
try
{
byte [] tmp = new byte[1];
client.Blocking = false;
client.Send(tmp, 0, 0);
Console.WriteLine("Connected!");
}
catch (SocketException e)
{
// 10035 == WSAEWOULDBLOCK
if (e.NativeErrorCode.Equals(10035))
{
Console.WriteLine("Still Connected, but the Send would block");
}
else
{
Console.WriteLine("Disconnected: error code {0}!", e.NativeErrorCode);
}
}
finally
{
client.Blocking = blockingState;
}
Console.WriteLine("Connected: {0}", client.Connected);
This client was used for a long time on Windows without any issues. A couple of month ago we tried to use this client on .NET Core and we tested it on Linux and Windows.
In our project we use ZooKeeperClient a lot to read nodes and set watchers.
Windows version works flawlessly. However, Linux version causes
Connection reset by peer
exception. I investigated this problem and read Zookeeper logs. I found out that Zookeeper didn't reset it's connection. I didn't capture any tcp dumps, but I'm pretty sure there are no TCP RST packets.Upgrading to .NET 5 makes the situation even worse. (ConnectionLossExceptions appear more often).
I decided to go deeper into the ZooKeeperClient code. I found a check which causes false-detected connection loss.
Unfortunately, I was not able to detect what causes this effect and how to reproduce this problem. Looks like a problem with sockets on Linux.
Removing this check solves the problem.
Also, this client sends KeepAlive pings anyway, so if there IS a real connection loss, we will know about it in a soon time (either next time we try to send something or next ping).