jcurl / RJCP.DLL.SerialPortStream

SerialPortStream is an independent implementation of System.IO.Ports.SerialPort and SerialStream for better reliability and maintainability. Default branch is 2.x and now has support for Mono with help of a C library.
Microsoft Public License
624 stars 197 forks source link

Unable to use COM ports after some times #50

Closed ChristopheCuddlup closed 6 years ago

ChristopheCuddlup commented 6 years ago

Hi,

Thanks a lot for your great lib.

I am trying to use it to communicate with 3 systems at the same time:

We can also see 4 other threads related to COM but that I do not manage. I believe these are coming from your lib. More details in the screnshots below.

By the way, I am running Windows Server 2012 R2 with 3 USB to RS232 cables (PL23030 chipset).

Do you have an idea why this could occur? Deadlock?

Thanks a lot for your help Christophe

allthreads com_1 com_2

ChristopheCuddlup commented 6 years ago

Hi,

Looking more deeply, we found something that heavily smells like a deadlock.

You will find below the different stack traces we have. Please search for "------->" string to find the locks taken by 2 of these threads.

We are trying to find a way to fix this issue.

By the way, the chipset we are using is the PL23030.

Any advice?

The stacks:

RJCP.SerialPortStream.dll!RJCP.IO.Ports.Native.SerialBuffer.RJCP.IO.Ports.Native.ISerialBufferStreamData.BytesToRead.get Normal RJCP.SerialPortStream.dll!RJCP.IO.Ports.Native.SerialBuffer.RJCP.IO.Ports.Native.ISerialBufferStreamData.BytesToRead.get() --------> SerialBuffer.m_ReadLock RJCP.SerialPortStream.dll!RJCP.IO.Ports.SerialPortStream.HandleEvent(object state) --------> SerialPortStream.m_EventLock mscorlib.dll!System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(object state)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
mscorlib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()
mscorlib.dll!System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
[Native to Managed Transition]

SerialPortStream_COM5 RJCP.SerialPortStream.dll!RJCP.IO.Ports.SerialPortStream.NativeSerial_ErrorReceived Normal RJCP.SerialPortStream.dll!RJCP.IO.Ports.SerialPortStream.NativeSerial_ErrorReceived(object sender, RJCP.IO.Ports.SerialErrorReceivedEventArgs e) --------> SerialPortStream.m_EventLock
RJCP.SerialPortStream.dll!RJCP.IO.Ports.Native.WinNativeSerial.OnCommError(object sender, RJCP.IO.Ports.SerialErrorReceivedEventArgs args)
RJCP.SerialPortStream.dll!RJCP.IO.Ports.Native.WinNativeSerial.CommOverlappedIo_CommErrorEvent(object sender, RJCP.IO.Ports.Native.Windows.CommErrorEventArgs e)
RJCP.SerialPortStream.dll!RJCP.IO.Ports.Native.Windows.CommOverlappedIo.OnCommErrorEvent(RJCP.IO.Ports.Native.Windows.CommErrorEventArgs args)
RJCP.SerialPortStream.dll!RJCP.IO.Ports.Native.Windows.CommOverlappedIo.GetReceiveStats(out uint bytesInRecvQueue, out bool eofReceived) --------> m_Buffer.ReadLock RJCP.SerialPortStream.dll!RJCP.IO.Ports.Native.Windows.CommOverlappedIo.OverlappedIoThreadMainLoop()
RJCP.SerialPortStream.dll!RJCP.IO.Ports.Native.Windows.CommOverlappedIo.OverlappedIoThread()
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart()
[Native to Managed Transition]

ChristopheCuddlup commented 6 years ago

We believed we fixed it by updating the SerialPortStream class as follows (still in testing phase).

Added: private object m_SerialErrorLock = new object(); private object m_EventProcessingLock = new object();

And changed methods as follows:

    private void HandleEvent(object state)
    {
        SerialData serialDataFlags = SerialData.NoData;
        SerialError serialErrorFlags = SerialError.NoError;
        SerialPinChange serialPinChange = SerialPinChange.NoChange;

        bool handleEvent;
        do {
            lock (m_EventLock) {
                if (IsDisposed) return;
                lock (m_SerialErrorLock)
                {
                    handleEvent = (m_SerialDataFlags != 0 || m_SerialErrorFlags != 0 || m_SerialPinChange != 0);
                    serialDataFlags = m_SerialDataFlags; m_SerialDataFlags = SerialData.NoData;
                    serialErrorFlags = m_SerialErrorFlags; m_SerialErrorFlags = SerialError.NoError;
                }
                serialPinChange = m_SerialPinChange; m_SerialPinChange = SerialPinChange.NoChange;
            }

            if (handleEvent) {
                SerialTrace.TraceSer.TraceEvent(TraceEventType.Verbose, 0, "{0}: HandleEvent: {1}; {2}; {3};", m_NativeSerial.PortName, serialDataFlags, serialErrorFlags, serialPinChange);
                // Received Data
                bool aboveThreshold;
                lock (m_EventLock) {
                    aboveThreshold = m_Buffer.Stream.BytesToRead >= m_RxThreshold;
                }
                if (aboveThreshold) {
                    OnDataReceived(this, new SerialDataReceivedEventArgs(serialDataFlags));
                } else if (serialDataFlags.HasFlag(SerialData.Eof)) {
                    OnDataReceived(this, new SerialDataReceivedEventArgs(SerialData.Eof));
                }

                // Modem Pin States
                if (serialPinChange.HasFlag(SerialPinChange.CtsChanged)) {
                    OnPinChanged(this, new SerialPinChangedEventArgs(SerialPinChange.CtsChanged));
                }
                if (serialPinChange.HasFlag(SerialPinChange.Ring)) {
                    OnPinChanged(this, new SerialPinChangedEventArgs(SerialPinChange.Ring));
                }
                if (serialPinChange.HasFlag(SerialPinChange.CDChanged)) {
                    OnPinChanged(this, new SerialPinChangedEventArgs(SerialPinChange.CDChanged));
                }
                if (serialPinChange.HasFlag(SerialPinChange.DsrChanged)) {
                    OnPinChanged(this, new SerialPinChangedEventArgs(SerialPinChange.DsrChanged));
                }
                if (serialPinChange.HasFlag(SerialPinChange.Break)) {
                    OnPinChanged(this, new SerialPinChangedEventArgs(SerialPinChange.Break));
                }

                // Error States
                if (serialErrorFlags.HasFlag(SerialError.TXFull)) {
                    OnCommError(this, new SerialErrorReceivedEventArgs(SerialError.TXFull));
                }
                if (serialErrorFlags.HasFlag(SerialError.Frame)) {
                    OnCommError(this, new SerialErrorReceivedEventArgs(SerialError.Frame));
                }
                if (serialErrorFlags.HasFlag(SerialError.RXParity)) {
                    OnCommError(this, new SerialErrorReceivedEventArgs(SerialError.RXParity));
                }
                if (serialErrorFlags.HasFlag(SerialError.Overrun)) {
                    OnCommError(this, new SerialErrorReceivedEventArgs(SerialError.Overrun));
                }
                if (serialErrorFlags.HasFlag(SerialError.RXOver)) {
                    OnCommError(this, new SerialErrorReceivedEventArgs(SerialError.RXOver));
                }
            }
        } while (handleEvent);

        lock (m_EventProcessingLock) {
            if (IsDisposed) return;
            m_EventProcessing.Reset();
        };
    }       

    private void NativeSerial_ErrorReceived(object sender, SerialErrorReceivedEventArgs e)
    {
        lock (m_SerialErrorLock) {
            m_SerialErrorFlags |= e.EventType;
        }
        CallEvent();
    }

    private void CallEvent()
    {
        lock (m_EventProcessingLock) {
            if (IsDisposed || m_EventProcessing.WaitOne(0)) return;
            m_EventProcessing.Set();
        }

        ThreadPool.QueueUserWorkItem(HandleEvent);
    }

    /// <summary>
    /// Clean up all resources managed by this object.
    /// </summary>
    /// <param name="disposing"><b>true</b> if the user is disposing this object,
    /// <b>false</b> if being cleaned up by the finalizer.</param>
    protected override void Dispose(bool disposing)
    {
        if (IsDisposed) return;

        if (disposing) {
            // Wait for events to finish before we dispose
            IsDisposed = true;
            bool eventRunning = false;
            lock (m_EventProcessingLock) {
                if (m_EventProcessing.WaitOne(0)) {
                    eventRunning = true;
                }
            }
            if (eventRunning) m_EventProcessing.WaitOne();
            m_EventProcessing.Dispose();

            if (m_Buffer != null) m_Buffer.Stream.AbortWait();

            m_NativeSerial.Dispose();
            m_NativeSerial = null;

            if (m_Buffer != null) {
                m_Buffer.Dispose();
                m_Buffer = null;
                m_ReadTo = null;
            }

            SerialTrace.Close();
        }
        base.Dispose(disposing);
    }

If this suits you, consider updating the last nuget version.

Hope this helps. Benoit & Christophe

jcurl commented 6 years ago

Sorry I haven't been able to answer in some time. Thank you very much for the detailed analysis. When I get more time, I'll go over this and provide an update. I think this would be an important fix overall.

ChristopheCuddlup commented 6 years ago

Hi, Any news on this? I'm asking cause we've been facing an other deadlock. I'll open another thread for it. Thanks Benoit & Christophe

jcurl commented 6 years ago

I've looked at the code, and my conclusion is this is a typical deadlock situation when locks aren't taken always in the same order. The solution you've provided is not optimal, it adds more locks which could likely lead to more problems (although I didn't analyse it in detail).

The m_EventLock is basically that what is trying to be done with the additional two locks. It just sets variables, so it can check.

The problem is this line:

private void HandleEvent(object state) {
  ...
  lock (m_EventLock) {
    aboveThreshold = m_Buffer.Stream.BytesToRead >= m_RxThreshold;
  }
  ...
}

I don't believe this lock is required and should be removed. It can only try to protect the variable m_RxThreshold from being modified simultaneously, but in public int ReceivedBytesThreshold there is no synchronisation there, so this lock effectively does nothing.

For the BytesToRead, you've seen in your own analysis that already takes the lock m_ReadLock when getting the bytes, and that has to remain to ensure that this value is consistent in case of a read occurring simultaneously with the I/O thread. The line I've highlighted above should contain the number of bytes in the buffer as it is called after data is already pushed into the buffer.

So I think the solution is just to remove the lock I've highlighted. I'm going to push this to a branch and ask you to retest if you will.

Thank you for your detailed report! It helped a lot in the analysis.

jcurl commented 6 years ago

Would you please test the code on the branch bugfix/issue-50? I've done some preliminary testing, but testing here is unlikely to find the same problem due to it being difficult to create error frames.

ChristopheCuddlup commented 6 years ago

Hi, Thanks for coming back to this issue. Testing will be difficult as this external devices we don't have in our premises. We'll try and let you know. Thanks!

jcurl commented 6 years ago

Based on the stack traces you provided, I was able to recreate the issue. If I modify the code so that GetReceiveStats always raises a Comm Error, I can recreate the problem with a specific test case within 5 seconds.

private bool GetReceiveStats(out uint bytesInRecvQueue, out bool eofReceived) {
  ...
  //if (cErr != 0) OnCommErrorEvent(new CommErrorEventArgs(cErr));
  OnCommErrorEvent(new CommErrorEventArgs(NativeMethods.ComStatErrors.CE_FRAME));

Removing the lock from HandleEvent resolves that and the test case now passes.

I will mark this as fixed when I make a new NuGet package. For now, I'll merge my changes into v2.x

ChristopheCuddlup commented 6 years ago

This is great, thank you!