Open dsparkplug opened 6 years ago
Alright, that's a fair question.
Do you have any repro scenario? It would be easier to check (or even to write a unit test) when we have a MCVE.
I still didn't understood what thread exactly deadlock here, so I'm not sure how to proceed.
I have an idea to make more things async / threadsafe by default in our code, but first I need to better understand this particular case.
I agree that the code base isn't in the best state now. Hopefully I'll finish some big things in my life in a couple of months and will be able to come back to this library.
In our project we have encountered with similar issues on simultaneous XmppStream.Write() attempts. At the least application waits a response from XmppStream.Write() from several threads. It seems to be a deadlock.
Currently I'll try to update our code by adding simply lock
clause around this command, hope it will help.
Unfortunately, locking doesn't help. Currently we're captured XmppStream.Write()
hanging again, with lock state. Find below a stack trace (this thread was stopped on XmppStream.Write()
, another one on lock
state):
[In a sleep, wait, or join]
JabberNet.dll!JabberNet.jabber.connection.XmppStream.JabberNet.jabber.connection.IStanzaEventListener.BytesWritten(byte[] buf = {byte[1437]}, int offset = 0, int count = 1437) + 0x3f bytes
JabberNet.dll!JabberNet.bedrock.net.AsyncSocket.WroteData(System.IAsyncResult ar) + 0xbf bytes
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x52 bytes
System.dll!System.Net.ContextAwareResult.CaptureOrComplete(ref System.Threading.ExecutionContext cachedContext, bool returnContext) + 0xfb bytes
System.dll!System.Net.ContextAwareResult.FinishPostingAsyncOp(ref System.Net.CallbackClosure closure = {System.Net.CallbackClosure}) + 0x77 bytes
System.dll!System.Net.Sockets.Socket.BeginSend(byte[] buffer, int offset, int size, System.Net.Sockets.SocketFlags socketFlags, out System.Net.Sockets.SocketError errorCode, System.AsyncCallback callback, object state) + 0x11c bytes
System.dll!System.Net.Sockets.Socket.BeginSend(byte[] buffer, int offset, int size, System.Net.Sockets.SocketFlags socketFlags, System.AsyncCallback callback, object state) + 0x41 bytes
System.dll!System.Net.Sockets.NetworkStream.BeginWrite(byte[] buffer, int offset, int size, System.AsyncCallback callback, object state) + 0xa4 bytes
JabberNet.dll!JabberNet.bedrock.net.AsyncSocket.Write(byte[] buf, int offset, int len) + 0x162 bytes
JabberNet.dll!JabberNet.jabber.connection.SocketStanzaStream.Write(string str) + 0x92 bytes
[our code].Utilities.Jabber.PracticeAreaChannel.RespondToQuery(JabberNet.jabber.protocol.client.IQ response = Cannot evaluate expression because the current thread is in a sleep, wait, or join) Line 914 + 0x22 bytes C#
[our stack here, removed by security reasons]
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x52 bytes
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) + 0x15e bytes
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) + 0x17 bytes
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) + 0x52 bytes
System.dll!System.Net.ContextAwareResult.Complete(System.IntPtr userToken) + 0x98 bytes
System.dll!System.Net.LazyAsyncResult.ProtectedInvokeCallback(object result, System.IntPtr userToken) + 0xcc bytes
System.dll!System.Net.HttpWebRequest.ProcessResponse() + 0x13c bytes
System.dll!System.Net.HttpWebRequest.SetResponse(System.Net.CoreResponseData coreResponseData) + 0x1b2 bytes
System.dll!System.Net.ConnectionReturnResult.SetResponses(System.Net.ConnectionReturnResult returnResult = {System.Net.ConnectionReturnResult}) + 0xbb bytes
System.dll!System.Net.Connection.ReadComplete(int bytesRead, System.Net.WebExceptionStatus errorStatus) + 0x450 bytes
System.dll!System.Net.Connection.PostReceive() + 0x109 bytes
System.dll!System.Net.Connection.ReadComplete(int bytesRead, System.Net.WebExceptionStatus errorStatus) + 0x2d1 bytes
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x52 bytes
System.dll!System.Net.LazyAsyncResult.ProtectedInvokeCallback(object result, System.IntPtr userToken) + 0xcc bytes
System.dll!System.Net.Security._SslStream.ProcessFrameBody(int readBytes, byte[] buffer, int offset, int count, System.Net.AsyncProtocolRequest asyncRequest) + 0x14b bytes
System.dll!System.Net.Security._SslStream.ReadFrameCallback(System.Net.AsyncProtocolRequest asyncRequest = {System.Net.AsyncProtocolRequest}) + 0xab bytes
System.dll!System.Net.AsyncProtocolRequest.CompleteRequest(int result) + 0x3d bytes
System.dll!System.Net.FixedSizeReader.CheckCompletionBeforeNextRead(int bytes) + 0x29 bytes
System.dll!System.Net.FixedSizeReader.ReadCallback(System.IAsyncResult transportResult) + 0x78 bytes
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x52 bytes
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) + 0x15e bytes
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) + 0x17 bytes
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) + 0x52 bytes
System.dll!System.Net.ContextAwareResult.Complete(System.IntPtr userToken) + 0x98 bytes
System.dll!System.Net.LazyAsyncResult.ProtectedInvokeCallback(object result, System.IntPtr userToken) + 0xcc bytes
System.dll!System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* nativeOverlapped) + 0x13a bytes
mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* pOVERLAP) + 0x7d bytes
[Native to Managed Transition]
Alright, this is one thread. Could you please show stack for other thread (the one holding the lock)?
I was not aware that second stack trace being useful taking into account that it was stopped on our lock, but I forgot about preceding code... So, please find it below, too:
[In a sleep, wait, or join]
[our code].Utilities.Jabber.PracticeAreaChannel.RespondToQuery(JabberNet.jabber.protocol.client.IQ response = Cannot evaluate expression because the current thread is in a sleep, wait, or join) Line 912 + 0x19 bytes C#
[our stack here, removed by security reasons]
JabberNet.dll!JabberNet.jabber.client.JabberClient.FireOnIQ(object sender, JabberNet.jabber.protocol.client.IQ iq = Cannot evaluate expression because the current thread is in a sleep, wait, or join) + 0x2f bytes
JabberNet.dll!JabberNet.jabber.connection.XmppStream.JabberNet.jabber.connection.IStanzaEventListener.StanzaReceived(System.Xml.XmlElement elem) + 0x48 bytes
JabberNet.dll!JabberNet.jabber.protocol.ElementStream.FireOnElement(System.Xml.XmlElement elem) + 0x2b bytes
JabberNet.dll!JabberNet.jabber.protocol.AsynchElementStream.EndTag(byte[] buf, int offset, JabberNet.xpnet.ContentToken ct, JabberNet.xpnet.TOK tok) + 0x134 bytes
JabberNet.dll!JabberNet.jabber.protocol.AsynchElementStream.Push(byte[] buf = {byte[4096]}, int offset = 0, int length = 1275) + 0x19c bytes
JabberNet.dll!JabberNet.jabber.connection.SocketStanzaStream.JabberNet.bedrock.net.ISocketEventListener.OnRead(JabberNet.bedrock.net.BaseSocket sock = {JabberNet.bedrock.net.AsyncSocket}, byte[] buf, int offset, int length) + 0xbc bytes
JabberNet.dll!JabberNet.bedrock.net.AsyncSocket.GotData(System.IAsyncResult ar) + 0x9e bytes
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x52 bytes
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) + 0x15e bytes
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) + 0x17 bytes
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) + 0x52 bytes
System.dll!System.Net.ContextAwareResult.Complete(System.IntPtr userToken) + 0x98 bytes
System.dll!System.Net.LazyAsyncResult.ProtectedInvokeCallback(object result, System.IntPtr userToken) + 0xcc bytes
System.dll!System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* nativeOverlapped) + 0x13a bytes
mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* pOVERLAP) + 0x7d bytes
[Native to Managed Transition]
Sorry for the delay - other priorities etc.
Here's a MCVE which sends 10000 test messages. The deadlock will generally occur after a few thousand messages with several threads stuck at AsyncSocket.Write and one stuck at XmppStream.BytesWritten.
Note that this is only appears to happen with AutoPresence = true . Also, I wouldn't normally create a new client and authenticate for each message as in this example, but the authentication seems to involved in causing the deadlock.
XMPP server is OpenFire 4.2.3
using JabberNet.bedrock;
using JabberNet.jabber;
using JabberNet.jabber.client;
using JabberNet.jabber.connection;
using JabberNet.jabber.protocol;
using System;
using System.Threading;
using System.Xml;
namespace JabberTest
{
public class Program
{
public static void Main(string[] args)
{
string server = "servername";
string xmppID = "sparky";
int port = 5222;
string password = "#######";
XmppSender xmppSender = new XmppSender();
int sent = 0;
for (var i = 0; i < 10000; i++)
{
if (!xmppSender.SendMessage(server, xmppID, port, password))
{
Console.WriteLine($"Failed - sent {sent} messages");
return;
}
sent++;
}
Console.WriteLine($"Success - sent {sent} messages"); ;
}
}
public class XmppSender
{
private ManualResetEvent _authenticatedEvent;
public XmppSender()
{
_authenticatedEvent = new ManualResetEvent(false);
}
private void client_OnAuthenticate(object sender)
{
Console.WriteLine("Authenticated");
_authenticatedEvent.Set();
}
private void client_OnConnect(object sender, StanzaStream stream)
{
Console.WriteLine("Connected");
}
private void client_OnDisconnect(object sender)
{
Console.WriteLine("Disconnected");
_authenticatedEvent.Reset();
}
private void client_OnError(object sender, Exception ex)
{
Console.WriteLine($"The following error occurred while publishing an XMPP notification:\n{ex}");
_authenticatedEvent.Set();
}
private void client_OnStreamError(object sender, XmlElement rp)
{
string info = rp == null ? "No further details" : rp.OuterXml;
Console.WriteLine($"The following stream error occurred while publishing an XMPP notification:\n{info}");
_authenticatedEvent.Set();
}
private void client_OnAuthError(object sender, XmlElement elem)
{
string info = elem == null ? "No further details" : elem.OuterXml;
Console.WriteLine($"The following authentication error occurred while publishing an XMPP notification:\n{info}");
_authenticatedEvent.Set();
}
public bool SendMessage(string server, string xmppID, int port, string password)
{
var jId = JID.Escape(xmppID, server, "Test");
JabberClient client = new JabberClient();
client.AutoReconnect = 3f;
client.User = jId.User;
client.Server = jId.Server;
client.NetworkHost = jId.Server;
client.Port = port;
client.Resource = jId.Resource;
client.Password = password;
client.AutoPresence = true; //note that a deadlock can occur if this is set to true
client.AutoRoster = false;
client.SSL = false;
client.AutoReconnect = 3;
client.PlaintextAuth = true;
client.AutoStartCompression = true;
client.AutoStartTLS = false;
client.AutoLogin = true;
client.OnError += new ExceptionHandler(client_OnError);
client.OnStreamError += new ProtocolHandler(client_OnStreamError);
client.OnAuthenticate += new ObjectHandler(client_OnAuthenticate);
client.OnDisconnect += new ObjectHandler(client_OnDisconnect);
client.OnAuthError += new ProtocolHandler(client_OnAuthError);
client.OnConnect += new StanzaStreamHandler(client_OnConnect);
try
{
if (!client.IsAuthenticated)
{
_authenticatedEvent.Reset();
if (!_authenticatedEvent.WaitOne(0))
client.Connect();
if (!_authenticatedEvent.WaitOne(10000))
return false;
}
if (!client.IsAuthenticated)
return false;
jId.Resource = client.Resource;
Console.WriteLine($"Sending message"); ;
client.Message(jId.BareJID, "test message");
}
finally
{
client.Close();
client.OnError -= client_OnError;
client.OnStreamError -= client_OnStreamError;
client.OnAuthenticate -= client_OnAuthenticate;
client.OnConnect -= client_OnConnect;
client.OnDisconnect -= client_OnDisconnect;
client.OnAuthError -= client_OnAuthError;
}
return true;
}
}
}
Any progress here?
And how dangerous to use AutoPresence = false
?
Sorry, I was preoccupied with other great deeds, so wasn't able to participate in this case. Honestly I think that you, guys, have better opportunity to fix the issue if you have a repro case.
Although, I will now do my best to research the case myself and fix it. I have some spare time and I'm on it right now.
@KotM, AutoPresence = false
is pretty safe (at least if you don't need this functionality actually), and it seems like it reduces the amount of deadlocks in the code.
I was able to reproduce the issue, although, sometimes it happens even with AutoPresence = false
. Looks like to terminate the issue completely we'll need pretty heavy rearchitecture of the JabberClient
part of the library :(
For now, I'm thinking about replacing the JabberClient
internals with some actor system such as Akka.NET that should solve the problems (partially, at least). Event dispatching should be asynchronous and without manual locking; that way we'll probably defeat the deadlocks.
I've found an circular locking issue with the IStanzaEventListener methods in the XmppStream class .
Basically the IStanzaEventListener.StanzaReceived which acquires a [MethodImpl(MethodImplOptions.Synchronized)] lock can call IStanzaEventListener.BytesWritten which also needs to acquire a [MethodImpl(MethodImplOptions.Synchronized)] lock and therefore deadlocks.
The chain of calls is as follows:
XmppStream.StanzaReceived acquires a MethodImpl(MethodImplOptions.Synchronized) lock --> calls XmppStream.OnElement if (State is not ServerFeaturesState.Instance or SASLState.Instance or StartTLSState.Instance or CompressionState.Instance or SASLAuthedState.Instance) --> invokes XmppStream.OnProtocol event In IQTracker OnProtocol event is set to OnIQ --> calls IQTracker.Call which calls the 'cb' callback which is set to JabberClient.GotSession in JabberClient.GotResource --> sets JabberClient.IsAuthenticated --> calls JabberClient.Presence --> calls JabberClient.Write --> calls SocketStanzaStream.JabberClient.Write --> calls AsyncSocket.Write --> calls Stream.BeginWrite with callback WroteData --> WroteData calls SocketStanzaStream.OnWrite --> calls XmppStream.BytesWritten which waits for a MethodImpl(MethodImplOptions.Synchronized) lock
Any ideas how to go about fixing this? The way this code uses callbacks is so convoluted its hard to know where to start without breaking something else.
The chain of events and subsequent deadlock is reproducible when sending a large number of XMPP messages.