mycroes / Sally7

C# implementation of Siemens S7 connections with a focus on performance
MIT License
56 stars 22 forks source link

Stuck requests when receiveSignal is released after Dispose() #52

Closed mycroes closed 6 months ago

mycroes commented 6 months ago

Consider following stack trace:

Sally7.Sally7Exception: Couldn't signal receive done.
   at Sally7.Sally7Exception.ThrowFailedToSignalReceiveDone()
   at Sally7.RequestExecutor.ConcurrentRequestExecutor.PerformRequest(ReadOnlyMemory`1 request, Memory`1 response, CancellationToken cancellationToken)
   at Sally7.S7Connection.ReadAsync(IDataItem[] dataItems, CancellationToken cancellationToken)
   at VLC.PLC.Sally7.Sally7PlcConnection.ReadIntermediateAsync(IEnumerable`1 readContainers)
   at VLC.PLC.PlcConnection.ReadAsync(IEnumerable`1 readContainers, Boolean triggerChanges)
   at VLC.PLC.Polling.PlcPoller.TimerCallback(Object state)

The following must have happened:

  1. Request has entered PerformRequest
  2. Request was sent
  3. Response was received
  4. ConcurrentRequestExecutor is disposed
  5. receiveSignal.TryRelease() returns false
  6. Sally7Exception.ThrowFailedToSignalReceiveDone() is invoked

If there was only one request, this results in the caller receiving the exception from point 6.

Now assume there was already a caller waiting:

  1. Request A was sent
  2. Request B was sent
  3. Caller A receives response B
  4. rec.Complete is called for Request B
  5. Caller A awaits Request A (no cancellation)
  6. Caller B receives response A
  7. Caller B exceptions due to receiveSignal.Dispose()
  8. Caller B receives the exception
  9. Request A is never completed
  10. Caller A is stuck

Possible solution:

  1. On JobPool.Dispose()
  2. Dispose() all Requests a. Request.Dispose() { _disposed = true; _continuation.Invoke(); } b. Request.GetResult() { ObjectDisposedException.ThrowIf(_disposed, this); return ... }
  3. Add ObjectDisposedException.ThrowIf to ConcurrentRequestExecutor throw paths?

Disadvantages?

mycroes commented 6 months ago

@gfoidl Would you care to reflect on the possible solution? I think I might as well need to flush the JobPool on Dispose() (do Reader.TryRead() until it returns false) and perhaps check for disposed/throw in RentJobIdAsync as well.

The most annoying issue here is that even though there's a default request timeout of 5s in read/write, that's implemented using CancellationToken and the final await jobPool.GetRequest(jobId) isn't affected by the cancellation, so I'm actually hitting this bug in production.

gfoidl commented 6 months ago

Would it help to remove Dispose, but introduce DisposeAsync where the async dispose can await outstanding request (w/ timeout).

I think the synchronous dispose makes a lot of troubles (as we see through various issues) -- and it gets sync-over-async here. My thinking is that async disposal could get rid of these troubles.

For the async disposal when a timout is hit, then proceed as you suggested above (and drain the JobPool as well).

BTW: are you able to produce a minimal repro of the problem or is it timing-relavant (race condition)?

mycroes commented 6 months ago

Would it help to remove Dispose, but introduce DisposeAsync where the async dispose can await outstanding request (w/ timeout).

I think this will move complexity, not sure if it really helps though. For one it would complicate my use of the library, because right now I don't have an async path to call DisposeAsync.

I think the synchronous dispose makes a lot of troubles (as we see through various issues) -- and it gets sync-over-async here. My thinking is that async disposal could get rid of these troubles.

I actually think it's fine. Async is nothing more than postponed Actions as far as we're concerned here, which means I can just invoke the postponed Actions on Dispose as suggested above.

For the async disposal when a timout is hit, then proceed as you suggested above (and drain the JobPool as well).

As you say, this would still require the sync Dispose after the timeout. That does mean we could implement both though, the DisposeAsync() with timeout and Dispose() that just kills any requests right away. I don't really care about waiting, I'm mostly only closing the connections on errors anyway, as was the case when this bug showed up (but apparently there was at least data in the buffer to process a single request).

BTW: are you able to produce a minimal repro of the problem or is it timing-relavant (race condition)?

In context to ConcurrentRequestExecutor it's timing relevant, but a test around JobPool could expose the underlying problem.

gfoidl commented 6 months ago

because right now I don't have an async path

Ah, then that doesn't make sense to have DisposeAsync. The sync Dispose I had in mind to just use internally, and don't expose it.


Then your suggestions from above seem fine.

mycroes commented 6 months ago

Thanks for reviewing! I think it would be fine to expose both Dispose and DisposeAsync, the latter needs some thought to implement though. I guess I'll be working on this tonight.