minghuaw / azeventhubs

Unofficial Azure Event Hubs SDK over AMQP 1.0 for rust
5 stars 2 forks source link

Error while recovering connection #4

Open ondrowan opened 10 months ago

ondrowan commented 10 months ago

I've recently started using this library and after a couple of days of consuming messages from Event Hub I've got hit by the following errors:

[azeventhubs::amqp::amqp_connection_scope] [ERROR] Error closing connection during recovering: IllegalState [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error ending CBS session during recovering: IllegalState [azeventhubs::amqp::amqp_cbs_link] [ERROR] CBS authorization refresh failed: Local error: ExpectImmediateDetach [message_processor] [ERROR] Link closed by remote [message_processor] [ERROR] Link closed by remote [message_processor] [ERROR] Link closed by remote [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error closing connection during recovering: IllegalState [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error ending CBS session during recovering: IllegalState [message_processor] [ERROR] Link closed by remote [message_processor] [ERROR] Link closed by remote

I've looked into the code that seems to cause this and found these lines:

https://github.com/minghuaw/azeventhubs/blob/7b3e7f31f237492820b81a2a4971cfde18b09e7a/src/amqp/amqp_connection_scope.rs#L704-L717

https://github.com/minghuaw/azeventhubs/blob/7b3e7f31f237492820b81a2a4971cfde18b09e7a/src/amqp/amqp_connection_scope.rs#L667-L689

If the connection is closed, should the recover_connection function still try to close it anyway? There's a similar scenario with the CBS handle a couple of lines below the first snippet. It however doesn't seem like this is causing the error, it's just something weird I've found while investigating the problem.

Do you have any idea why this could be happening? Or is it perhaps something users should be handling manually?

minghuaw commented 9 months ago

@ondrowan I have implemented a quick patch in PR #28 and released it as "0.18.3". I have run over the limit for my test instance, so I haven't tested it thoroughly with an IoT Hub instance. Would you mind giving this a try?

ondrowan commented 9 months ago

Do you get the same error when you use any of the official SDK? I haven't been keeping track of the changes to the official SDKs.

Unfortunatelly I haven't had time to look into that yet.

Were you able to receive message if you manually starting up a completely new consumer client after all the retrying fails? The current problem, after looking thru the latest logs, look quite similar to the problem with management link that we mentioned before.

I haven't tried that yet. Since we were trying to collect as many debug logs as possible I haven't altered the code to create a new client if reading of the message fails.

I have implemented a quick patch in PR #28 and released it as "0.18.3". I have run over the limit for my test instance, so I haven't tested it thoroughly with an IoT Hub instance. Would you mind giving this a try?

It is now deployed and will let you know if something interesting happens.

minghuaw commented 9 months ago

@ondrowan how's "0.18.3" doing so far?

ondrowan commented 9 months ago

I swear it crashes only during weekends. I got both versions with 0.18.3 and 0.17.0 (I think) running in parallel and neither of them has crashed since 17th of December.

ondrowan commented 8 months ago

I'm back from vacation and since 17th of December, neither 0.17.0 version nor 0.18.3 crashed. The only thing that has changed since then is that we're sending slightly more messages through both of these queues (10+ instead of ~5).

ondrowan commented 8 months ago

It started crashing again a couple of days ago. While version 0.17.0 did not recover and stopped receiving messages, 0.18.3 recovered successfully during at least 2 different scenarios:

[2024-01-14 06:00:14] [message_processor] [INFO] 32 messages were processed in the past minute. [2024-01-14 06:01:14] [message_processor] [INFO] 35 messages were processed in the past minute. [2024-01-14 06:02:14] [message_processor] [INFO] 33 messages were processed in the past minute. [2024-01-14 06:02:34] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: Receive(LinkStateError(IllegalSessionState)) [2024-01-14 06:02:35] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-14 06:02:35] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] Recovering connection [2024-01-14 06:02:35] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error closing connection during recovering: TransportError(Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })) [2024-01-14 06:02:35] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error ending CBS session during recovering: IllegalConnectionState [2024-01-14 06:02:35] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-14 06:02:35] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management client during recovery: IllegalSessionState [2024-01-14 06:02:35] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management session during recovery: IllegalConnectionState [2024-01-14 06:02:36] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-14 06:02:36] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-14 06:02:36] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-14 06:02:36] [azeventhubs::amqp::amqp_client] [DEBUG] Consumer recovered [2024-01-14 06:03:14] [message_processor] [INFO] 32 messages were processed in the past minute. [2024-01-14 06:04:14] [message_processor] [INFO] 34 messages were processed in the past minute. [2024-01-14 06:05:14] [message_processor] [INFO] 34 messages were processed in the past minute. [2024-01-14 06:06:14] [message_processor] [INFO] 33 messages were processed in the past minute.

[2024-01-15 10:42:14] [message_processor] [INFO] 32 messages were processed in the past minute. [2024-01-15 10:43:14] [message_processor] [INFO] 32 messages were processed in the past minute. [2024-01-15 10:44:06] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: Receive(LinkStateError(RemoteClosed)) [2024-01-15 10:44:08] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 10:44:08] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 10:44:08] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-15 10:44:08] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-15 10:44:08] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-15 10:44:08] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Recovering consumer by creating new consumer [2024-01-15 10:44:14] [message_processor] [INFO] 32 messages were processed in the past minute. [2024-01-15 10:45:14] [message_processor] [INFO] 32 messages were processed in the past minute.

So far it seems everything works well. I'll keep monitoring it for some time before we can close this issue.

Good job and thanks for all the support! It's really appreciated.

minghuaw commented 8 months ago

@ondrowan Would you mind checking if the message received before and after the disruption are ordered correctly if possible (ie. no duplicated message or loss of message)? This is something I was not able to test in my test instance, and I would like to see if my implementation is correct :)

ondrowan commented 8 months ago

It seems I've jinxed it 😓

This happened yesterday after one of the partitions most likely stopped receiving messages:

[2024-01-15 22:24:18] [rustls::client::common] [DEBUG] Client auth requested but no cert/sigscheme available [2024-01-15 22:25:14] [message_processor] [INFO] 36 messages were processed in the past minute. [2024-01-15 22:26:14] [message_processor] [INFO] 39 messages were processed in the past minute. [2024-01-15 22:27:14] [message_processor] [INFO] 38 messages were processed in the past minute. [2024-01-15 22:28:14] [message_processor] [INFO] 37 messages were processed in the past minute. [2024-01-15 22:29:14] [message_processor] [INFO] 37 messages were processed in the past minute. [2024-01-15 22:30:14] [message_processor] [INFO] 38 messages were processed in the past minute. [2024-01-15 22:31:14] [message_processor] [INFO] 39 messages were processed in the past minute. [2024-01-15 22:32:14] [message_processor] [INFO] 36 messages were processed in the past minute. [2024-01-15 22:32:51] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: Receive(LinkStateError(IllegalSessionState)) [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] Recovering connection [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error closing connection during recovering: RemoteClosed [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error ending CBS session during recovering: IllegalConnectionState [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management client during recovery: IllegalSessionState [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management session during recovery: IllegalConnectionState [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-15 22:32:53] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-15 22:32:54] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-15 22:32:57] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 22:32:57] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 22:32:57] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-15 22:32:57] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-15 22:32:57] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-15 22:32:57] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-15 22:32:57] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-15 22:33:03] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 22:33:03] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 22:33:03] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-15 22:33:04] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-15 22:33:04] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-15 22:33:04] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-15 22:33:04] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-15 22:33:04] [message_processor] [ERROR] Error while reading the message from partition 1. Caused by: 0: Session has dropped 1: Session has dropped [2024-01-15 22:33:04] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: Receive(LinkStateError(IllegalSessionState)) [2024-01-15 22:33:05] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 22:33:05] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 22:33:05] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-15 22:33:06] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-15 22:33:06] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-15 22:33:06] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-15 22:33:06] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-15 22:33:09] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 22:33:09] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 22:33:09] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway ... [2024-01-15 22:33:52] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: Receive(LinkStateError(IllegalSessionState)) [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] Recovering connection [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error closing connection during recovering: RemoteClosedWithError(Error { condition: ConnectionError(ConnectionForced), description: Some("The connection was inactive for more than the allowed 60000 milliseconds and is closed by container 'LinkTracker'. TrackingId:d3bd2693c5eb49f7827496307cb8f3c8_G26, SystemTracker:gateway5, Timestamp:2024-01-15T22:33:53"), info: None }) [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error ending CBS session during recovering: IllegalConnectionState [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management client during recovery: IllegalSessionState [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management session during recovery: IllegalConnectionState [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-15 22:33:54] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-15 22:33:58] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 22:33:58] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 22:33:58] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-15 22:33:58] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-15 22:33:58] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-15 22:33:58] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-15 22:33:58] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-15 22:34:04] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-15 22:34:04] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-15 22:34:04] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-15 22:34:05] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-15 22:34:05] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-15 22:34:05] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-15 22:34:05] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-15 22:34:05] [message_processor] [ERROR] Error while reading the message from partition 1. Caused by: 0: Session has dropped 1: Session has dropped

@ondrowan Would you mind checking if the message received before and after the disruption are ordered correctly if possible (ie. no duplicated message or loss of message)? This is something I was not able to test in my test instance, and I would like to see if my implementation is correct :)

I'll add some more debug logs and let you know afterwards.

minghuaw commented 8 months ago

I wonder if there was something wrong during CBS recovery. I have made a quick patch release (0.18.4) with just some more debug logs. Hopefully we can find something in the new logs

ondrowan commented 8 months ago

I got some logs from yesterday evening:

[2024-01-24 20:13:47] [message_processor] [INFO] 7 messages were processed in the past minute. [2024-01-24 20:14:24] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: Receive(LinkStateError(IllegalSessionState)) [2024-01-24 20:14:25] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:14:25] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] Recovering connection [2024-01-24 20:14:25] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error closing connection during recovering: RemoteClosed [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error ending CBS session during recovering: RemoteEnded [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management client during recovery: IllegalSessionState [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management session during recovery: RemoteEnded [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:14:26] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:14:29] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:14:36] [message_processor] [ERROR] Error while reading the message from partition 1. Caused by: 0: Session has dropped 1: Session has dropped [2024-01-24 20:14:36] [reqwest::connect] [DEBUG] starting new connection: https://o1221905.ingest.sentry.io/ [2024-01-24 20:14:36] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: Receive(LinkStateError(IllegalSessionState)) [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:14:38] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:14:41] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:14:41] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:14:41] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:14:41] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-24 20:14:41] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:14:41] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:14:41] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:14:41] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:14:42] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) ... [2024-01-24 20:15:14] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:15:14] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:15:14] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:15:18] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:15:24] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:15:24] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:15:24] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:15:24] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-24 20:15:24] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:15:24] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:15:24] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:15:24] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:15:25] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:15:25] [message_processor] [ERROR] Error while reading the message from partition 1. Caused by: 0: Session has dropped 1: Session has dropped [2024-01-24 20:15:25] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: Receive(LinkStateError(IllegalSessionState)) [2024-01-24 20:15:26] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:15:26] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] Recovering connection [2024-01-24 20:15:26] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error closing connection during recovering: RemoteClosedWithError(Err description: Some("The connection was inactive for more than the allowed 60000 milliseconds and is closed by container 'LinkTracker'. Trackin acker:gateway5, Timestamp:2024-01-24T20:15:25"), info: None }) [2024-01-24 20:15:26] [azeventhubs::amqp::amqp_connection_scope] [ERROR] Error ending CBS session during recovering: IllegalConnectionState [2024-01-24 20:15:26] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:15:26] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:15:26] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management client during recovery: IllegalSessionState [2024-01-24 20:15:26] [azeventhubs::amqp::amqp_management_link] [ERROR] Found error closing old management session during recovery: IllegalConnectionState [2024-01-24 20:15:27] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:15:27] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:15:27] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:15:27] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:15:27] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:15:30] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering client [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_connection_scope] [DEBUG] CBS session and link recovered [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_management_link] [DEBUG] Recovering management link [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link session is still open, performing recovery anyway [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_management_link] [DEBUG] Management link recovered [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_client] [DEBUG] Client recovered [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_client] [DEBUG] Recovering consumer [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_cbs_link] [DEBUG] Requesting CBS authorization. [2024-01-24 20:15:37] [azeventhubs::amqp::amqp_consumer::single] [DEBUG] Failed to receive event: LinkDetach(IllegalSessionState) [2024-01-24 20:15:37] [message_processor] [ERROR] Error while reading the message from partition 1. Caused by: 0: Session has dropped 1: Session has dropped

minghuaw commented 8 months ago

It seems like the CBS auth was successful but attempting to recover the consumer afterwards failed. I wonder if Event Hubs does not want the AMQP link resumption on new session entirely. I will look into how the dotnet SDK handles recovery

minghuaw commented 8 months ago

In the meantime, if you don't mind, you can try enable the trace level log which will show all the underlying AMQP traffic. I am not sure if sensitive information like auth token is logged, but if you are worried about exposing sensitive information, you could send the log in email.

minghuaw commented 8 months ago

After taking a quick look into how the dotnet SDK handles recovery, it seems like instead of resuming the link on new AMQP connection/session, an entirely new link is created, which means the built-in recovery mechanism provided by the AMQP protocol is not used at all.

This is probably fine because the AMQP's built-in mechanism targets at dealing with message loss/duplicates during network disruptions. However, the Event Hubs targets use cases where high throughput is desired, and its use of AMQP protocol purposely discards the AMQP's built-in recovery mechanism. Plus, all incoming messages are immediately accepted, whereas in Service Bus the user decides whether the message is accepted or not.

Fortunately, we already have the code for recovering consumer by creating new one in our previous attempt to solve this issue. This wouldn't take long to implement.

minghuaw commented 8 months ago

Fortunately, we already have the code for recovering consumer by creating new one in our previous attempt to solve this issue. This wouldn't take long to implement.

This is implemented in new releases "0.18.5" and "0.19.2"

ondrowan commented 8 months ago

Good find! I've updated to 0.19.2 and enabled the trace logs in one of the instances. Will let you know once there's something interesting in the logs.

minghuaw commented 7 months ago

I might have found another potential cause of the Event Hubs refusing to resume links using the mechanism specified in the AMQP spec.

The amqp protocol message has a field for message format (a number), and there really is just one official message format. However, Event Hubs uses a custom message format number. While sending regular messages, this field is explicitly set to Event Hub's, but the auto-recovery mechanism doesn't remember the message format for unsettled messages and simply assumes the official message format. This would probably lead to Event Hubs refusing to resume.

minghuaw commented 6 months ago

@ondrowan Have there been anything interesting so far?

ondrowan commented 6 months ago

Sorry for a late reply, I've been completely swamped at work. Mostly because everything works as expected! There's been one small hiccup about a month ago when one of the partitions stopped receiving messages for a while, but since then there haven't been any problems at all.