Closed albinsuresh closed 1 year ago
I want to highlight that the issue is beyond the stability issues we observed on mosquitto 2.0.
tedge_agent --init
option to create this persisted session on install ... with a major drawback : messages are consumed and discarded if the option is used after install.tedge connect c8y
and remove on tedge disconnect c8y
. After the first install and after a disconnect, there is no more bridge, i.e. the topics c8y/#
are topics without any subscribers. Any message published on a c8y/#
topic before a tedge connect c8y
will be lost. Here the --init
trick to create a session cannot work - because the subscriber is mosquitto itself.I see the fix proposed here as the right approach.
--init
option of the mapper and the agent must no more create a session, because of the risk to discard messages using --init
inadvertently.Addressed this through below PRs
https://github.com/thin-edge/thin-edge.io/pull/2065
https://github.com/thin-edge/thin-edge.io/pull/2050
Created a follow-up ticket: https://github.com/thin-edge/thin-edge.io/issues/2070
Is your feature request related to a problem? Please describe.
There are many interactions between different tedge daemons where one expects the other daemon to be up and running to respond to any requests that it sends. Here are some examples:
Because of these dependencies, some of these messages could be lost if the requestee service is not up and running when the requester service sends a request.
Currently, we rely on MQTT broker's persistence-session feature to have these requests persisted, even if the requestee service is not up and running to receive those requests. But the broker keeps them persisted to deliver it to that service when it starts later. But, this persistence session feature is not very stable on the mosquitto broker that we use and hence we need a solution that doesn't fully rely on this feature.
Describe the solution you'd like
Make daemons request data from other daemons only once their liveness is validated. For example, c8y-mapper should request the the software list from tedge-agent only once it can confirm that tedge-agent is up an running. The
tedge/health
endpoints of these daemons could be used to check this liveness.Describe alternatives you've considered
Defining systemd service dependencies could be an alternative, but there are many cases where some service pairs have dependencies on each other, leading to cyclic dependencies. Even otherwise, it would have been a systemd specific solution.