Closed reubenmiller closed 8 months ago
Disabling specific services in the rugpi recipe on the first boot of the device seems to have resolved the main issue, see https://github.com/thin-edge/tedge-rugpi-core/pull/15 for details.
Though we should still address this as it makes the product more robust, but it is not a blocker for 1.0.0
This issue is caused by mapper mapping messages too early, before the bridge is even established. Here is the sequence of events:
c8y.url
and device.id
are not set, it continues in a restart loop.tedge-agent
and c8y-firmware-plugin
are started and they publish their capability messages.c8y.url
and device.id
are set in the tedge.toml
, but before the tedge connect
is triggered (c8y-bridge is established), the mapper starts mapping the capability messages to SR 114 messages and publishes those to the bridge topic, which are simply dropped by the broker, as the bridge client is not connected yet.tedge connect
eventually succeeds, it just tries to start and not restart the mapper, but that's a no-op as the service is already started. If the mapper had restarted, it would have re-processed the retained capability messages.There are two fixes possible:
Quick and dirty fix
Restart the mapper as part of tedge connect
instead of just start, which does nothing if the service is already started. This will make sure that the mapper re-processes all the retained messages again.
Proper fix
Make sure that the mapper doesn't start mapping messages until the bridge connection is established (waiting for any health status message of the bridge client on te/device/main/service/mosquitto-c8y-bridge/status/health
, irrespective of the value 1
or 0
as we just care about the bridge client's local MQTT connection being established and not the remote connection status). This will avoid messages being published to c8y/#
topics getting dropped, before the bridge client has even subscribed to those.
The fix makes sure that the mapper doesn't start mapping any messages or even send the init messages until the bridge is established. The mapper determines if the bridge is established by listening on te/device/main/service/mosquitto-c8y-bridge/status/health
topic and waits till the very first message arrives on that topic.
I've rechecked the rugpi image and now the supported operations are registered correctly.
@gligorisaev FYI: there is a RF test to cover this. Test name is Mapper started early does not miss supported operations
.
QA has thoroughly checked the bug and here are the results:
Describe the bug
To Reproduce
Build image using tedge-rugpi-image
Onboard devices via
Open device in Cumulocity IoT Device Management UI and check if the firmware, restart device operations have been registered (e.g. is firmware tab visible, is the restart device button on the info page)
Workaround
Restarting the
tedge-mapper-c8y
service results supported operations being sent to Cumulocity.Expected behavior
The supported operations should be sent to Cumulocity IoT after bootstrapping without having to reconnect to c8y.
Screenshots
Environment (please complete the following information):
Debian GNU/Linux 12 (bookworm)
Raspberry Pi 5 Model B Rev 1.0
Linux rpi5-d83addab8e9f 6.1.0-rpi7-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux
tedge 1.0.0-rc.2~344+g9cf6904
Additional context
logs: tedge-mapper-c8y
The errors at the beginning of the logs are prior to the bootstrapping, then it starts up after being bootstrapped (e.g. set c8y.url and create certificate)
cat /etc/tedge/.tedge-mapper-c8y/entity_store.jsonl
Retained MQTT Messages
Device managed object
The Cumulocity IoT device management object shows that non of the supported operations have been registered.
Note The managed object has been trimmed to remove noise (e.g. relationship links, large software list etc.)