minghuaw / azeventhubs

Unofficial Azure Event Hubs SDK over AMQP 1.0 for rust
4 stars 2 forks source link

Hanging on close EventStream #2

Closed jackgerrits closed 8 months ago

jackgerrits commented 8 months ago

I am not sure how to best debug this, but I have 4 tokio tasks created (One for each of 4 partitions) and I have a cancellation token that is used stop reading from each partition. As part of cleanup, close is called on the EventStream. With 4 partitions though, pretty reliably one of the close calls are hanging and never completing.

Each partition is using an independent EventHubConsumerClient to start its partition stream.

minghuaw commented 8 months ago

Interesting. Let me see if i can recreate this behavior.

minghuaw commented 8 months ago

My initial guess is that there's something wrong with my state machine implementation. A temporary workaround would be just drop the EventStream. This will still perform the graceful shutdown but just in a non-async manner

jackgerrits commented 8 months ago

Thanks! That's a great work around

minghuaw commented 8 months ago

Hi @jackgerrits I tried to recreate the bug but haven't been able to. I have added a new example (https://github.com/minghuaw/azeventhubs/blob/main/examples/spawn_multiple_consumer.rs) which essentially does the same thing with the exception that there are 3 partitions instead of 4 (this is simply because of how my test eventhubs instance is set up).

This example has been working fine so far on both my local machine (mac) and my testing Azure VM instance. The underlying AMQP implementation may log an error that says IllegalConnectionState (this would only appear in logs but won't return an error), but this is usually caused by the server side closing the underlying TCP connection before the client side completes the handshake, and this should not lead to client hanging.

jackgerrits commented 8 months ago

Totally understand, I wasn't able to distill a repro so I am sorry about that! This isn't actionable so I am going to go ahead and close this. Especially since simply dropping the connection seems to suffice for me.