Closed mbaillargeon-ubi closed 5 months ago
Hi @mbaillargeon-ubi, The clients receive the timeout because the server is handling a lot of connections/streams at the same time. You should also check the server logs to see if there is some timeout or error.
In this case, you should add some random sleep before connecting all the clients. It will prevent all the clients from connecting at the same time.
Btw pr with you change is welcome 😀!
@mbaillargeon-ubi Can you please check https://github.com/rabbitmq/rabbitmq-stream-dotnet-client/pull/385 ?
Hi @Gsantomaggio . Thanks for #385, I was going to create a pull request today :) . I like that you also exposed a way to configure the timeout. It looks good to me. I will be waiting for an official release with this fix to try it out. Until then, I will try your suggestion to limit the number of clients connecting back to the stream at the same time to see if it reduces the occurences of timeouts or of that actual crash.
@mbaillargeon-ubi FYI: https://www.nuget.org/packages/RabbitMQ.Stream.Client/1.8.7 Please let me know
I tested with the package 1.8.7 and I could not repro the crash. I will go ahead and update our production servers with this version. Thanks for the quick release
Describe the bug
Hi, We are experiencing crashes on our servers (windows / net6) which internally use RabbitMQ.Stream.Client. It occurs when multiple clients are connecting at the same time. Here is the callstack we get in the windows event logs
This crash is with version 1.8.6 but we initially had it with 1.7.4. I tried updating the package, but no luck it still crashes.
From what I could debug, the following is occurring. In the following code (from client.cs), the callback that gets called on timeout to do the
SetException
is called by a system timer (as we can see in the callstack of the crash). When it gets called, the task seems to sometime be in the completed state (I guess the task just completed). TheSetException
call is internally callingSetException
on aManualResetValueTaskSourceCore
which throws anInvalidOperationException
if the task is completed. Since the exception is not catched it makes our process crash.I tried the following code in
ManualResetValueTaskSource
to validate the status before doing the call toSetException
and I no longer reproduce the crash. If you want I could do a pull request with that code.Reproduction steps
Expected behavior
The exception should either not occur or be handled since it occurs in a different thread and the users of the library cannot catch
Additional context
To give some context we have 2 servers load balanced. Whenever a client application connects to our server we create a client to stream rabbitmq messages exchanged on a stream. Whenever we shutdown one of the servers (to perform an update for instance), all the clients switch to the other server which will create a client and stream back to the last received message from the stream. This crash occurs when we shutdown one of our server that has lots of clients (about 100). So we get about 100 clients connecting to the other server at the same time and connecting to the stream.