rabbitmq / rabbitmq-dotnet-client

RabbitMQ .NET client for .NET Standard 2.0+ and .NET 4.6.2+
https://www.rabbitmq.com/dotnet.html
Other
2.05k stars 575 forks source link

Add cancellation to initial socket connection #1428

Closed lukebakken closed 6 months ago

lukebakken commented 7 months ago

Part of the fix to #1420

lukebakken commented 7 months ago

Thanks @danielmarbach!

cc @paulomorgado @bording (in case y'all have an idea)

I'm busy trying to figure out the following exception (from some of the recent workflow runs). The exception normally would indicate that RabbitMQ closed the connection abruptly, but the RabbitMQ logs state the following:

2023-12-06 22:12:01.703000+00:00 [warning] <0.54213.0> closing AMQP connection <0.54213.0> (127.0.0.1:56730 -> 127.0.0.1:5672 - I.TestInitialConnection.TestBasicConnectionRecoveryWithHostnameList:2023-12-06T22:11:27.4855406Z:16, vhost: '/', user: 'guest'):
2023-12-06 22:12:01.703000+00:00 [warning] <0.54213.0> client unexpectedly closed TCP connection

This error never happens on my local workstation, and only happens with .NET Framework 4.7.2.

This error (socket error 10054) has cropped up over the years with this client library and RabbitMQ with no clear reason why.

Exception Stacktrace ``` Failed Test.Integration.TestInitialConnection.TestBasicConnectionRecoveryWithHostnameList [22 s] Error Message: RabbitMQ.Client.Exceptions.BrokerUnreachableException : None of the specified endpoints were reachable ---- System.IO.IOException : Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host. -------- System.Net.Sockets.SocketException : An existing connection was forcibly closed by the remote host Stack Trace: at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\api\ConnectionFactory.cs:line 692 at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEnumerable`1 hostnames, String clientProvidedName) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\api\ConnectionFactory.cs:line 537 at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEnumerable`1 hostnames) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\api\ConnectionFactory.cs:line 490 at Test.IntegrationFixtureBase.CreateAutorecoveringConnection(IEnumerable`1 hostnames, TimeSpan requestedConnectionTimeout, TimeSpan networkRecoveryInterval) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\Test\Common\IntegrationFixtureBase.cs:line 214 at Test.IntegrationFixtureBase.CreateAutorecoveringConnection(IList`1 hostnames) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\Test\Common\IntegrationFixtureBase.cs:line 202 at Test.Integration.TestInitialConnection.TestBasicConnectionRecoveryWithHostnameList() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\Test\Integration\TestInitialConnection.cs:line 49 ----- Inner Stack Trace ----- at System.Net.Sockets.NetworkStream.BeginWrite(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state) at System.IO.Stream.<>c.b__53_0(Stream stream, ReadWriteParameters args, AsyncCallback callback, Object state) at System.Threading.Tasks.TaskFactory`1.FromAsyncTrim[TInstance,TArgs](TInstance thisRef, TArgs args, Func`5 beginMethod, Func`3 endMethod) at System.IO.Stream.BeginEndWriteAsync(Byte[] buffer, Int32 offset, Int32 count) at System.IO.Stream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken) at System.IO.Pipelines.StreamExtensions.WriteAsync(Stream stream, ReadOnlyMemory`1 buffer, CancellationToken cancellationToken) at System.IO.Pipelines.StreamPipeWriter.d__39.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult() at RabbitMQ.Client.Impl.SocketFrameHandler.d__37.MoveNext() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\SocketFrameHandler.cs:line 293 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult() at RabbitMQ.Client.Framing.Impl.Connection.d__4.MoveNext() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\Connection.Commands.cs:line 90 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable.ConfiguredValueTaskAwaiter.GetResult() at RabbitMQ.Client.Framing.Impl.Connection.d__89.MoveNext() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\Connection.cs:line 233 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at RabbitMQ.Client.Framing.Impl.Connection.d__89.MoveNext() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\Connection.cs:line 252 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Threading.Tasks.ValueTask`1.get_Result() at RabbitMQ.Client.TaskExtensions.EnsureCompleted[T](ValueTask`1 task) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\TaskExtensions.cs:line 158 at RabbitMQ.Client.Framing.Impl.Connection.Open() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\Connection.cs:line 217 at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.Open() in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\impl\AutorecoveringConnection.cs:line 72 at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName) in D:\a\rabbitmq-dotnet-client\rabbitmq-dotnet-client\projects\RabbitMQ.Client\client\api\ConnectionFactory.cs:line 680 ----- Inner Stack Trace ----- at System.Net.Sockets.Socket.BeginSend(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, AsyncCallback callback, Object state) at System.Net.Sockets.NetworkStream.BeginWrite(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state) ```
paulomorgado commented 7 months ago

Sorry. Networking is not really one of my expertises.

This might be related to some networking equipment/software between the client and the server.

I don't know if there are some ETW events you can collect to investigate.

You may find interesting tools here: https://www.youtube.com/watch?v=wf2M0GNQpWI

lukebakken commented 7 months ago

@paulomorgado thanks for taking a look.

This might be related to some networking equipment/software between the client and the server

Yep, normally I would tell a RabbitMQ user "your firewall or load balancer is acting up" but in the case of GitHub Actions and Windows, RabbitMQ and the tests are running on the same machine, communicating via the loopback adapter.

bording commented 7 months ago

I'm pretty sure I have observed the same problem occasionally, but I also don't have any insights to offer as to the cause.

paulomorgado commented 7 months ago

Is this something that can be forced to happen?

lukebakken commented 7 months ago

Is this something that can be forced to happen?

I sure wish it were!

lukebakken commented 7 months ago

@stebet @danielmarbach thanks again for taking a look. I realize this isn't a complete use of cancellation tokens, but is a good start. I'd like to merge this soon and produce another alpha release. Community users are using and testing out version 7 alpha releases already.

lukebakken commented 6 months ago

Oh, and magically it seems like that socket error is less likely to happen 🤷‍♂️

lukebakken commented 6 months ago

No comments in a couple days, merging and producing a new 7.0 alpha!

danielmarbach commented 6 months ago

I'm happy to give feedback but sometimes I just can't do that in a matter of days and especially not before holiday season

lukebakken commented 6 months ago

I'm happy to give feedback but sometimes I just can't do that in a matter of days and especially not before holiday season

No worries @danielmarbach. All your feedback is appreciated. I just want to keep the ball rolling on version 7 with new alpha releases.