thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
222 stars 55 forks source link

supported operations are not being registered after initial bootstrapping of the tedge-rugpi-image #2689

Closed reubenmiller closed 8 months ago

reubenmiller commented 9 months ago

Describe the bug

To Reproduce

  1. Build image using tedge-rugpi-image

  2. Onboard devices via

    c8y tedge bootstrap root@mydevice
  3. Open device in Cumulocity IoT Device Management UI and check if the firmware, restart device operations have been registered (e.g. is firmware tab visible, is the restart device button on the info page)

Workaround

Restarting the tedge-mapper-c8y service results supported operations being sent to Cumulocity.

systemctl restart tedge-mapper-c8y

Expected behavior

The supported operations should be sent to Cumulocity IoT after bootstrapping without having to reconnect to c8y.

Screenshots

Environment (please complete the following information):

Property Value
OS [incl. version] Debian GNU/Linux 12 (bookworm)
Hardware [incl. revision] Raspberry Pi 5 Model B Rev 1.0
System-Architecture Linux rpi5-d83addab8e9f 6.1.0-rpi7-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux
thin-edge.io version tedge 1.0.0-rc.2~344+g9cf6904

Additional context

logs: tedge-mapper-c8y

The errors at the beginning of the logs are prior to the bootstrapping, then it starts up after being bootstrapped (e.g. set c8y.url and create certificate)

Feb 06 21:15:18 rpi5-d83addab8e9f tedge-mapper[2046]: Error: A value for 'c8y.url' is missing.
Feb 06 21:15:18 rpi5-d83addab8e9f tedge-mapper[2046]:     A value can be set with `tedge config set c8y.url <value>`
Feb 06 21:15:18 rpi5-d83addab8e9f systemd[1]: tedge-mapper-c8y.service: Main process exited, code=exited, status=1/FAILURE
Feb 06 21:15:18 rpi5-d83addab8e9f systemd[1]: tedge-mapper-c8y.service: Failed with result 'exit-code'.
Feb 06 21:15:23 rpi5-d83addab8e9f systemd[1]: tedge-mapper-c8y.service: Scheduled restart job, restart counter is at 32.
Feb 06 21:15:23 rpi5-d83addab8e9f systemd[1]: Stopped tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format..
Feb 06 21:15:23 rpi5-d83addab8e9f systemd[1]: Starting tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format....
Feb 06 21:15:23 rpi5-d83addab8e9f systemd[1]: Started tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format..
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.880280385Z  INFO Runtime: Started
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.880326387Z  INFO Runtime: Running Signal-Handler-0
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.880353833Z  INFO Runtime: Running HealthMonitorActor-1
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.88300962Z  INFO Runtime: Running MQTT-2
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883038214Z  INFO Runtime: Running C8YJwtRetriever-3
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883052733Z  INFO Runtime: Running HTTP-4
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.8830644Z  INFO Runtime: Running C8Y-REST-5
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883076494Z  INFO Runtime: Running C8yAuthProxy-6
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883155794Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=localhost:1883, session_name=Some("tedge-mapper-c8y")
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883311949Z  INFO Runtime: Running FsWatcher-7
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883316671Z  INFO C8Y-REST: start initialisation
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883339154Z  INFO Runtime: Running Timer-8
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883483216Z  INFO HTTP Server: Cumulocity proxy has HTTPS disabled (configured in `c8y.proxy.cert_path`/`c8y.proxy.key_path`) and certificate authentication disabled (configured in `c8y.proxy.ca_path`)
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883499531Z  INFO c8y_auth_proxy::server: Launching on port 8001 with HTTP
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883595925Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=localhost:1883, session_name=None
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.883997758Z  INFO mqtt_channel::connection: MQTT connection established
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.884201397Z  INFO mqtt_channel::connection: MQTT connection established
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.88609802Z  INFO tedge_api::entity_store: Loading the entity store from the log
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.88613067Z  INFO tedge_api::entity_store: Finished loading the entity store from the log
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.886172264Z  INFO Runtime: Running CumulocityMapper-9
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.886197987Z  INFO Runtime: Running MQTT-10
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.886208266Z  INFO Runtime: Running Uploader-11
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.886217081Z  INFO Runtime: Running Downloader-12
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.886225378Z  INFO Runtime: Running OldAgentAdapter-13
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.886280325Z  INFO c8y_mapper_ext::inventory: Could not read inventory fragments from file "/etc/tedge/device/inventory.json"
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.8863204Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=localhost:1883, session_name=Some("last_will_c8y_mapper")
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.88670677Z  INFO mqtt_channel::connection: MQTT connection established
Feb 06 21:15:23 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:23.905192203Z  INFO c8y_api::http_proxy: JWT token requested
Feb 06 21:15:30 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:30.754983341Z ERROR mqtt_channel::connection: MQTT connection error: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
Feb 06 21:15:30 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:30.755219351Z ERROR mqtt_channel::connection: MQTT connection error: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
Feb 06 21:15:30 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:30.755266612Z ERROR c8y_api::http_proxy: Fail to retrieve JWT token: MQTT connection error: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
Feb 06 21:15:30 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:30.755329281Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 20 seconds
Feb 06 21:15:30 rpi5-d83addab8e9f tedge-mapper[2114]:  Error: CustomError("JWT token not available")
Feb 06 21:15:30 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:30.755514881Z ERROR mqtt_channel::connection: MQTT connection error: Mqtt state: Io error: Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }
Feb 06 21:15:31 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:31.756719674Z  INFO mqtt_channel::connection: MQTT connection re-established
Feb 06 21:15:31 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:31.756945146Z  INFO mqtt_channel::connection: MQTT connection re-established
Feb 06 21:15:31 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:31.756987703Z  INFO mqtt_channel::connection: MQTT connection closed
Feb 06 21:15:31 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:31.757119079Z  INFO mqtt_channel::connection: MQTT connection re-established
Feb 06 21:15:50 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:50.755777205Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=localhost:1883, session_name=None
Feb 06 21:15:50 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:50.756551301Z  INFO mqtt_channel::connection: MQTT connection established
Feb 06 21:15:50 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:50.777861262Z  INFO c8y_api::http_proxy: JWT token requested
Feb 06 21:15:50 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:50.804569965Z  INFO c8y_api::http_proxy: JWT token received
Feb 06 21:15:50 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:50.804787323Z  INFO mqtt_channel::connection: MQTT connection closed
Feb 06 21:15:50 rpi5-d83addab8e9f tedge-mapper[2114]: 2024-02-06T21:15:50.996422116Z  INFO C8Y-REST: initialisation done.

cat /etc/tedge/.tedge-mapper-c8y/entity_store.jsonl

{"version":"1.0"}
{"topic":{"name":"te/device/main///twin/c8y_Agent"},"payload":"{\"name\":\"thin-edge.io\",\"url\":\"https://thin-edge.io\",\"version\":\"1.0.0-rc.2~344+g9cf6904\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/tedge-mapper-collectd"},"payload":"{\"@parent\":\"device/main//\",\"@type\":\"service\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/tedge-agent"},"payload":"{\"@parent\":\"device/main//\",\"@type\":\"service\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/tedge-container-monitor"},"payload":"{\"@type\":\"service\",\"name\":\"tedge-container-monitor\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main///twin/c8y_Hardware"},"payload":"{\"model\":\"Raspberry Pi 5 Model B Rev 1.0\",\"revision\":\"-c04170\",\"serialNumber\":\"82f6f3b5d6119a23\",\"sysClockSync\":\"no\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main///twin/c8y_Firmware"},"payload":"{\"name\":\"tedge_rugpi_tryboot-containers\",\"updated\":\"Mon 11 Dec 04:55:26 GMT 2023\",\"version\":\"20240206.2203\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main///twin/device_OS"},"payload":"{\"arch\":\"aarch64\",\"displayName\":\"Debian GNU/Linux 12 (bookworm)\",\"family\":\"GNU/Linux\",\"hostname\":\"rpi5-d83addab8e9f\",\"kernel\":\"#1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24)\",\"sysClockSync\":\"no\",\"version\":\"12\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/tedge-mapper-c8y"},"payload":"{\"@parent\":\"device/main//\",\"@type\":\"service\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/mosquitto-c8y-bridge"},"payload":"{\"@id\":\"rpi5-d83addab8e9f:device:main:service:mosquitto-c8y-bridge\",\"@parent\":\"device/main//\",\"@type\":\"service\",\"name\":\"mosquitto-c8y-bridge\",\"type\":\"service\"}","qos":1,"retain":true}

Retained MQTT Messages

# tedge mqtt sub '#'
INFO: Connected
[te/device/main/service/tedge-mapper-collectd] {"@parent":"device/main//","@type":"service","type":"service"}
[te/device/main/service/tedge-mapper-collectd/status/health] {"pid":895,"status":"up","time":1707254131.757406}
[te/device/main/service/tedge-agent] {"@parent":"device/main//","@type":"service","type":"service"}
[te/device/main/service/tedge-agent/status/health] {"pid":913,"status":"up","time":1707254131.7568846}
[te/device/main/service/tedge-container-monitor] {"@type":"service","name":"tedge-container-monitor","type":"service"}
[te/device/main/service/tedge-container-monitor/status/health] {"status":"down"}
[te/device/main/service/tedge-mapper-c8y] {"@parent":"device/main//","@type":"service","type":"service"}
[te/device/main/service/tedge-mapper-c8y/status/health] {"pid":2114,"status":"up","time":1707254131.7567549}
[te/device/main/service/mosquitto-c8y-bridge] {"@id":"rpi5-d83addab8e9f:device:main:service:mosquitto-c8y-bridge","@parent":"device/main//","@type":"service","name":"mosquitto-c8y-bridge","type":"service"}
[te/device/main/service/mosquitto-c8y-bridge/status/health] 1
[te/device/main///cmd/config_snapshot] {"types":["tedge-configuration-plugin","tedge-log-plugin","tedge-mosquitto.conf","tedge.toml"]}
[te/device/main///cmd/config_update] {"types":["tedge-configuration-plugin","tedge-log-plugin","tedge-mosquitto.conf","tedge.toml"]}
[te/device/main///cmd/log_upload] {"types":["c8y_Command","firmware_update","software-management"]}
[te/device/main///cmd/firmware_update] {}
[te/device/main///cmd/restart] {}
[te/device/main///cmd/software_list] {}
[te/device/main///cmd/software_update] {}
[te/device/main///twin/c8y_Hardware] {"sysClockSync":"no","model":"Raspberry Pi 5 Model B Rev 1.0","revision":"-c04170","serialNumber":"82f6f3b5d6119a23"}
[te/device/main///twin/firmware] {"name":"tedge_rugpi_tryboot-containers","version":"20240206.2203","updated":"Mon 11 Dec 04:55:26 GMT 2023"}
[te/device/main///twin/device_OS] {"displayName":"Debian GNU/Linux 12 (bookworm)","version":"12","hostname":"rpi5-d83addab8e9f","arch":"aarch64","family":"GNU/Linux","kernel":"#1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24)","sysClockSync":"no"}
[te/device/main///twin/c8y_Agent] {"name":"thin-edge.io","url":"https://thin-edge.io","version":"1.0.0-rc.2~344+g9cf6904"}

Device managed object

The Cumulocity IoT device management object shows that non of the supported operations have been registered.

Note The managed object has been trimmed to remove noise (e.g. relationship links, large software list etc.)

{
  "c8y_IsDevice": {},
  "c8y_SoftwareList": [
    {
      "name": "apt",
      "softwareType": "apt",
      "url": "",
      "version": "2.6.1"
    }
  ],
  "com_cumulocity_model_Agent": {
    "address": null,
    "port": null,
    "primary": null
  },
  "creationTime": "2024-02-06T21:15:30.523Z",
  "deviceParents": {
    "references": [],
    "self": "https://t9679.latest.stage.c8y.io/inventory/managedObjects/556370/deviceParents"
  },
  "id": "556370",
  "lastUpdated": "2024-02-06T21:15:51.358Z",
  "name": "rpi5-d83addab8e9f",
  "owner": "device_rpi5-d83addab8e9f",
  "self": "https://t9679.latest.stage.c8y.io/inventory/managedObjects/556370",
  "type": "tedge_rugpi_tryboot-containers"
}
reubenmiller commented 9 months ago

Disabling specific services in the rugpi recipe on the first boot of the device seems to have resolved the main issue, see https://github.com/thin-edge/tedge-rugpi-core/pull/15 for details.

Though we should still address this as it makes the product more robust, but it is not a blocker for 1.0.0

albinsuresh commented 9 months ago

This issue is caused by mapper mapping messages too early, before the bridge is even established. Here is the sequence of events:

  1. Mapper is started, but since the c8y.url and device.id are not set, it continues in a restart loop.
  2. tedge-agent and c8y-firmware-plugin are started and they publish their capability messages.
  3. As part of the bootstrapping logic, as soon as the c8y.url and device.id are set in the tedge.toml, but before the tedge connect is triggered (c8y-bridge is established), the mapper starts mapping the capability messages to SR 114 messages and publishes those to the bridge topic, which are simply dropped by the broker, as the bridge client is not connected yet.
  4. When tedge connect eventually succeeds, it just tries to start and not restart the mapper, but that's a no-op as the service is already started. If the mapper had restarted, it would have re-processed the retained capability messages.
  5. Restarting the agent or the firmware plugin (making them publish their capabilities again) wouldn't have helped either as the "supported operations really updated detection mechanism" of the mapper introduced in #2588 would have ignored them anyway as duplicates.

There are two fixes possible:

Quick and dirty fix

Restart the mapper as part of tedge connect instead of just start, which does nothing if the service is already started. This will make sure that the mapper re-processes all the retained messages again.

Proper fix

Make sure that the mapper doesn't start mapping messages until the bridge connection is established (waiting for any health status message of the bridge client on te/device/main/service/mosquitto-c8y-bridge/status/health, irrespective of the value 1 or 0 as we just care about the bridge client's local MQTT connection being established and not the remote connection status). This will avoid messages being published to c8y/# topics getting dropped, before the bridge client has even subscribed to those.

albinsuresh commented 8 months ago

Test Plan

The fix makes sure that the mapper doesn't start mapping any messages or even send the init messages until the bridge is established. The mapper determines if the bridge is established by listening on te/device/main/service/mosquitto-c8y-bridge/status/health topic and waits till the very first message arrives on that topic.

reubenmiller commented 8 months ago

I've rechecked the rugpi image and now the supported operations are registered correctly.

reubenmiller commented 8 months ago

@gligorisaev FYI: there is a RF test to cover this. Test name is Mapper started early does not miss supported operations.

gligorisaev commented 8 months ago

QA has thoroughly checked the bug and here are the results: