Closed rina23q closed 1 year ago
This is the result of a Mosquitto bug where it drops any new messages meant for a persistent client after a restart until that client connects back. Here is a demonstration of the same using mosquitto clients:
alsu@alsu-VirtualBox:~$ mosquitto_sub -v -i tracing-client -c -q 1 -t 'test/topic'
alsu@alsu-VirtualBox:~$ mosquitto_pub -q 1 -t test/topic -m hello1
alsu@alsu-VirtualBox:~$ sudo systemctl restart mosquitto.service
alsu@alsu-VirtualBox:~$ mosquitto_pub -q 1 -t test/topic -m hello2
alsu@alsu-VirtualBox:~$ mosquitto_sub -v -i tracing-client -c -q 1 -t 'test/topic'
test/topic hello1
alsu@alsu-VirtualBox:~$ mosquitto_pub -q 1 -t test/topic -m hello3
alsu@alsu-VirtualBox:~$ mosquitto_sub -v -i tracing-client -c -q 1 -t 'test/topic'
test/topic hello3
As shown above, any new messages, meant to be delivered to a persistent client, received after a restart are dropped until that persistent client connects back. The weird thing is that the messages received before the restart are still delivered to that client as soon as it connects back.
In our case with the status update of restart operation, the message is indeed sent by the agent after the device restart. But the c8y-mapper typically starts late and hence it doesn't get this message sent earlier by tedge-agent. The mapper should have gotten it when it connects, as it has a persistent session with the broker, but doesn't because of the message drop bug showcased above.
Found that this was an issue recognised by Mosquitto as https://github.com/eclipse/mosquitto/issues/769 and fixed in v1.6.11. But, unfortunately this mosquitto version on the 1.6 release line is not available in the Ubuntu offical apt repo. So, will have to install it manually, or switch to the latest 2.x release line.
Since Mosquitto 1.6.11 or higher is not even available on the public Debian repositories, and since the 2.x line also has several bugs around persistence sessions, we'll have to fix this issue in thin-edge itself to workaround this persistence session bug of Cumulocity.
This can be done by introducing a new request-response contract between the mapper and the agent. In addition to the agent sending the status of the last executed operation on its startup, we'll also introduce another topic like tedge/init
on tedge-agent which can be triggered by the mapper to make the agent send the status of the last executed operation.
In future this trigger can be extended to make the tedge-agent send all kinds of init messages like supported operations, current software list etc. This will help any external plugin or mapper to always get a summary of the agent's current status with a simple trigger.
Closing ticket as debian buster and earlier will not get the updated mosquitto
versions with this bug fix as they have moved to the oldstable
(and below) status. debian bullseye is the current stable
version, and mosquitto
2.0.11 is the publicly available version.
However, if you would like to install a newer version of mosquito which includes the mosquitto bugfix, then you can check debian repositories provided by the mosquitto page directly.
Describe the bug After triggering a restart operation, the device was restarted successfully. However, the operation status in Cumulocity is still in EXECUTING.
To Reproduce Trigger a restart operation from Cumulocity.
Expected behavior The tedge-agent reports "SUCCESSFUL" to Cumulocity.
Screenshots
Environment (please complete the following information):
Additional context