rabbitmq / discussions

Please use RabbitMQ mailing list for questions. Issues that are questions, discussions or lack details necessary to investigate them are moved to this repository.
3 stars 4 forks source link

MQTT: unsuscribe not respond if previous delivery was a duplicate #94

Closed DariusBabrauskas closed 4 years ago

DariusBabrauskas commented 4 years ago

Issue occurs, when "poor wifi or gsm connection" and mqtt client or server begun repeat messages with DUP flag. If then client send unsubscribe message, server not respond to this and respond only old respond to DUB message. Client waiting about 10 seconds, then timeout occurs. Client log below.

Server: RabbitMQ logo 3.7.24 Erlang 22.3 Client: Aws FreeRtos MQTT Qs1

rabbitmq-plugins list Listing plugins with pattern "." ... Configured: E = explicitly enabled; e = implicitly enabled | Status: = running on ** |/ [ ] rabbitmq_amqp1_0 3.7.24 [ ] rabbitmq_auth_backend_cache 3.7.24 [ ] rabbitmq_auth_backend_http 3.7.24 [ ] rabbitmq_auth_backend_ldap 3.7.24 [ ] rabbitmq_auth_mechanism_ssl 3.7.24 [ ] rabbitmq_consistent_hash_exchange 3.7.24 [ ] rabbitmq_event_exchange 3.7.24 [ ] rabbitmq_federation 3.7.24 [ ] rabbitmq_federation_management 3.7.24 [ ] rabbitmq_jms_topic_exchange 3.7.24 [E] rabbitmq_management 3.7.24 [e] rabbitmq_management_agent 3.7.24 [E] rabbitmq_mqtt 3.7.24 [ ] rabbitmq_peer_discovery_aws 3.7.24 [ ] rabbitmq_peer_discovery_common 3.7.24 [ ] rabbitmq_peer_discovery_consul 3.7.24 [ ] rabbitmq_peer_discovery_etcd 3.7.24 [ ] rabbitmq_peer_discovery_k8s 3.7.24 [ ] rabbitmq_random_exchange 3.7.24 [ ] rabbitmq_recent_history_exchange 3.7.24 [ ] rabbitmq_sharding 3.7.24 [ ] rabbitmq_shovel 3.7.24 [ ] rabbitmq_shovel_management 3.7.24 [ ] rabbitmq_stomp 3.7.24 [ ] rabbitmq_top 3.7.24 [ ] rabbitmq_tracing 3.7.24 [ ] rabbitmq_trust_store 3.7.24 [e] rabbitmq_web_dispatch 3.7.24 [ ] rabbitmq_web_mqtt 3.7.24 [ ] rabbitmq_web_mqtt_examples 3.7.24 [ ] rabbitmq_web_stomp 3.7.24 [ ] rabbitmq_web_stomp_examples 3.7.24

Client log:

UnsubscribeSync->mqttexampleclient/example/topic t:2216521 [DEBUG][THREAD][lu] Locking mutex 0xc0c32b38. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32b38. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0) Creating new operation record. [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0) Reference count changed from 2 to 3. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8. [DEBUG][THREAD][lu] Creating new semaphore 0xc0c32c88. [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] MQTT UNSUBSCRIBE packet: a2 23 08 95 00 1f 6d 71 74 74 65 78 61 6d 70 6c 65 63 6c 69 65 6e 74 2f 65 78 61 6d 70 6c 65 2f 74 6f 70 69 63 [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Sending MQTT packet. c>m:AT+QSSLSEND=3,37 c c<md: c<md:SEND OK [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Job reference changed from 2 to 1. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8. [INFO ][MQTT][lu] (MQTT connection 0xc0c32ab0) UNSUBSCRIBE operation scheduled. [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [INFO ][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Waiting for operation completion. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8. +QSSLURC: "recv",3,4 sock:0 receiveCallback rxw:1 n:pool1 40 rxw:1 n:pool1 02 rxw:2 n:pool1 08 93 [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0) PUBACK in data stream. Packet identifier 2195. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0) Searching for operation PUBLISH pending response with packet identifier 2195. [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0) Operation PUBLISH not found. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8.

...................... ..................... .......................

[WARN ][THREAD][lu] Timeout waiting on semaphore 0xc0c32c88. [INFO ][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Wait complete with result TIMEOUT. [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Job reference changed from 1 to 0. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Destroying operation. [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Removed operation from connection lists. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) MQTT packet freed. [DEBUG][THREAD][lu] Destroying semaphore 0xc0c32c88. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Wait semaphore destroyed. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0, UNSUBSCRIBE operation 0xc0c32c40) Operation record destroyed. [DEBUG][THREAD][lu] Locking mutex 0xc0c32ac8. [DEBUG][MQTT][lu] (MQTT connection 0xc0c32ab0) Reference count changed from 3 to 2. [DEBUG][THREAD][lu] Unlocking mutex 0xc0c32ac8. UnsubscribeSync res:TIMEOUT t:2226635

michaelklishin commented 4 years ago

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team).

We get at least a dozen of questions through various venues every single day, often light on details. At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because GitHub is a tool our team uses heavily nearly every day, the signal/noise ratio of issues is something we care about a lot.

Please post this to rabbitmq-users.

Thank you.

michaelklishin commented 4 years ago

I'm afraid we cannot conclude what may be happening in your system. We need server logs from all nodes, an executable way to reproduce and ideally a traffic capture that would reveal what exactly the client does.

Duplicate deliveries suggest that this client has lost its connection. How it reacts to such events, can vary greatly and application code can also play a role.

DariusBabrauskas commented 4 years ago

Hi michaelklishin, I can do tcpdump at server side but currently connections is secure. So, you can see only encrypted packet lengths (not real size). At server logs I see only connect/disconnect info and at other file old crash info. My notice, that Unsubscribe respond packet lost on net channel have small chance. BR.

DariusBabrauskas commented 4 years ago

Hi michaelklishin, Why you closed this issue? It looks like serious rabbit mqtt server bug. BR.

DariusBabrauskas commented 4 years ago

I'm afraid we cannot conclude what may be happening in your system. We need server logs from all nodes, an executable way to reproduce and ideally a traffic capture that would reveal what exactly the client does.

Duplicate deliveries suggest that this client has lost its connection. How it reacts to such events, can vary greatly and application code can also play a role.

I not agree. Duplicate deliveries suggest, that respond not delivered in 1s timeout. Second in 2s... (poor connection)

michaelklishin commented 4 years ago

@DariusBabrauskas there is no evidence that this is a RabbitMQ bug, or that it is a bug at all. https://github.com/rabbitmq/discussions/issues/94#issuecomment-613965784 explains the way our team uses GitHub. This is mailing list material. A traffic capture and server logs will help understand what is really going on. Unless we have evidence of an incorrect or otherwise undesired/confusing behavior by our MQTT implementation, there is no specific, actionable item our team can work on.

I have no idea how you have arrived at a conclusion that "a response has not been delivered in a 1s timeout". RabbitMQ MQTT plugin and MQTT 3.1.1 the protocol do not place any limits on how soon a delivery or acknowledgments can arrive, in fact, such guarantees in a realistic distributed system would be violated from time to time anyway. Both protocol and this plugin will mark redeliveries as such. In case of MQTT (with other messaging protocols there are other possible scenarions), a redelivery almost always means that a connection has been lost.

Let's stop guessing what is going on. It is too time consuming. Please provide server logs from all nodes, a traffic capture that demonstrates the behavior and, if possible, a snippet of code and a link to the client library used. Then we'd have something to work with. Until then, this will be treated as a question/RCA analysis request with insufficient information provided, not a "serious RabbitMQ bug".

michaelklishin commented 4 years ago

Two piece of relevant information that I could find in https://docs.aws.amazon.com/freertos/latest/lib-ref/c-sdk/mqtt/mqtt_demo.html:

michaelklishin commented 4 years ago

If connection traffic is encrypted, Wireshark can be provided with with a private key and actual decrypted TCP traffic then can be exported. Packet lengths is not the only thing we care about. We need to know what exactly does the client do and how the server responds. If there is no way to take a traffic capture, RabbitMQ log level can be set to debug which won't log even frame received but should at least help keep track of certain events. All errors will be logged regardless.

rabbitmq-diagnostics consume_event_stream running against the target node would also log some relevant events, such as new connections (using any protocol), queue declarations, consumer registration (subscriptions), consumer cancelation (unsubscription events), connection closure or loss events and so on. This is not as informative as a traffic capture but is infinitely more useful than guessing about what the client and server really do.

DariusBabrauskas commented 4 years ago

Code you can see her: https://github.com/FreeRTOS/FreeRTOS/blob/lts-development/FreeRTOS-Plus/Demo/FreeRTOS_IoT_Libraries/mqtt/common/DemoTasks/SimpleMQTTExamples.c This project have visual studio port. I tested on windows10 it worked without problems. (No latency, no dublicated msg.) Log level I set to debug. Now I see more messages in log file.

Wireshark not allways can decode. It depends of selected chippers. But I will try. Currently I trying test on other mqtt server. Thank you. BR.

DariusBabrauskas commented 4 years ago

Testing on other mqqt server "not respond" error repeated. So I makes more debug and found respond to unsubscribe (4 bytes) not handled in circular buffer. Now looks, that it serious client bug. Thank you michael klishin. BR.

michaelklishin commented 4 years ago

@DariusBabrauskas thank you for reporting your findings back to us. It indeed is very useful to compare several implementations to narrow things down.