thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
223 stars 55 forks source link

built-in bridge connection not sending or receiving messages from the cloud #2960

Closed reubenmiller closed 3 weeks ago

reubenmiller commented 5 months ago

Describe the bug

The tedge-mapper-c8y process stops sending data to Cumulocity IoT when using the built-in bridge.

The device where the problem was seen includes a special MQTT subscription which looks at the tedge-mapper-bridge-c8y service status, and the log shows that the status was toggling between up and down, though it is currently showing as up even though the device is not responding to operations or sending measurements.

{"tst":"2024-06-20T17:33:43.322236+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":1,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-06-20T17:34:26.586193+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":1,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-06-20T21:03:53.619944+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-06-20T21:04:22.153423+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-06-20T23:47:14.195201+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-06-20T23:47:42.685712+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-06-20T23:57:14.234556+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-06-20T23:57:45.771221+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-06-21T00:00:02.014411+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-06-21T00:00:53.770193+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-06-24T15:06:27.301306+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-06-24T15:06:27.304560+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}

The tedge connect c8y --test mostly fails, but sporadically passes (e.g. it will receive a jwt every now and then), below shows both the failed connection test.

# tedge connect c8y --test
The system config file '/etc/tedge/system.toml' doesn't exist. Using '/bin/systemctl' as the default service manager

Detected mosquitto version >= 2.0.0
Sending packets to check connection. This may take up to 2 seconds.

ERROR: Local MQTT publish has timed out.
Error: failed to test connection to Cumulocity cloud.

Caused by:
    Operation timed out. Is mosquitto running?

Symptoms

To Reproduce

It is unclear how to reproduce the bug. This section will be expanded once more information is known.

Expected behavior

The built-in bridge should reliably reconnect to the IoT Cloud, and the status of the bridge should be faithfully represented on the associated service topic.

Screenshots

Environment (please complete the following information):

Property Value
OS [incl. version] Debian GNU/Linux 12 (bookworm)
Hardware [incl. revision] Raspberry Pi 4 Model B Rev 1.5
System-Architecture Linux rackfslot2 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux
thin-edge.io version tedge 1.1.2~129+gdc64912

Additional context

journalctl -fu tedge-mapper-c8y

Jun 20 17:03:34 rackfslot2 systemd[1]: Starting tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format....
Jun 20 17:03:34 rackfslot2 systemd[1]: Started tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format..
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: The system config file '/etc/tedge/system.toml' doesn't exist. Using '/bin/systemctl' as the default service manager
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.497365785Z  INFO Runtime: Started
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.497500376Z  INFO Runtime: Running Signal-Handler-0
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.497563912Z  INFO Runtime: Running HealthMonitorActor-1
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.503755952Z  INFO Runtime: Running MQTT-2
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.503843821Z  INFO Runtime: Running C8YJwtRetriever-3
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.503879598Z  INFO Runtime: Running HTTP-4
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.503900487Z  INFO Runtime: Running C8Y-REST-5
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.503920209Z  INFO Runtime: Running C8yAuthProxy-6
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.503944079Z  INFO Runtime: Running FsWatcher-7
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.503969153Z  INFO Runtime: Running Timer-8
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.504152299Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=Some("tedge-mapper-c8y")
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.504521591Z  INFO C8Y-REST: start initialisation
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.504651811Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.505064788Z  INFO HTTP Server: Cumulocity proxy has HTTPS disabled (configured in `c8y.proxy.cert_path`/`c8y.proxy.key_path`) and certificate authentication disabled (configured in `c8y.proxy.ca_path`)
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.505098362Z  INFO c8y_auth_proxy::server: Launching on port 8001 with HTTP
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.506125609Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.506328607Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.508222437Z  INFO tedge_api::entity_store: Loading the entity store from the log
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.508325584Z  INFO tedge_api::entity_store: Finished loading the entity store from the log
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.508681783Z  INFO Runtime: Running CumulocityMapper-9
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.508748764Z  INFO Runtime: Running MQTT-10
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.50877406Z  INFO Runtime: Running Uploader-11
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.508795837Z  INFO Runtime: Running Downloader-12
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.508819319Z  INFO Runtime: Running OldAgentAdapter-13
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.508948299Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=Some("last_will_c8y_mapper")
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.509721512Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.527473602Z  INFO c8y_api::http_proxy: JWT token requested
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.537850778Z  INFO c8y_api::http_proxy: JWT token received
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.53836105Z  INFO mqtt_channel::connection: MQTT connection closed
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.550569649Z  INFO c8y_mapper_ext::inventory: Could not read inventory fragments from file "/etc/tedge/device/inventory.json"
Jun 20 17:03:34 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:34.631698097Z  INFO C8Y-REST: initialisation done.
Jun 20 17:03:47 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:47.222033473Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jun 20 17:03:47 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:47.223673769Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 17:03:47 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:47.245669143Z  INFO c8y_api::http_proxy: JWT token requested
Jun 20 17:03:47 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:47.288658237Z  INFO c8y_api::http_proxy: JWT token received
Jun 20 17:03:47 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:03:47.289498191Z  INFO mqtt_channel::connection: MQTT connection closed
Jun 20 17:04:05 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:04:05.996223181Z  INFO c8y_auth_proxy::server: Closed websocket proxy from Cumulocity to client
Jun 20 17:04:05 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:04:05.996428031Z  INFO c8y_auth_proxy::server: Closed websocket proxy from client to Cumulocity
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.381344014Z ERROR mqtt_channel::connection: MQTT connection error: Mqtt state: Io error: Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.381958044Z ERROR mqtt_channel::connection: MQTT connection error: Mqtt state: Io error: Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }
Jun 20 17:05:32 rackfslot2 systemd[1]: Stopping tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format....
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.48686931Z  INFO Runtime: Shutting down
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.487805207Z  INFO c8y_auth_proxy::actor: Shutdown
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.487926039Z  INFO Runtime: Actor has finished: HTTP-4
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.487954001Z  INFO Runtime: Actor has finished: Signal-Handler-0
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.487972038Z  INFO Runtime: Actor has finished: MQTT-2
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488007945Z  INFO Runtime: Actor has finished: C8Y-REST-5
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488024556Z  INFO Runtime: Actor has finished: FsWatcher-7
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.4880425Z  INFO Runtime: Actor has finished: CumulocityMapper-9
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488059389Z  INFO Runtime: Actor has finished: MQTT-10
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488076963Z  INFO Runtime: Actor has finished: Uploader-11
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488094852Z  INFO Runtime: Actor has finished: Downloader-12
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488113851Z  INFO Runtime: Actor has finished: OldAgentAdapter-13
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488131277Z  INFO Runtime: Actor has finished: HealthMonitorActor-1
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488148981Z  INFO Runtime: Actor has finished: Timer-8
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.48816624Z  INFO Runtime: Actor has finished: C8yAuthProxy-6
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488182869Z  INFO Runtime: Actor has finished: C8YJwtRetriever-3
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488198721Z  INFO Runtime: All actors have finished
Jun 20 17:05:32 rackfslot2 tedge-mapper[156097]: 2024-06-20T15:05:32.488515884Z  INFO c8y_auth_proxy::server: Closed websocket proxy from client to Cumulocity
Jun 20 17:05:32 rackfslot2 systemd[1]: tedge-mapper-c8y.service: Deactivated successfully.
Jun 20 17:05:32 rackfslot2 systemd[1]: Stopped tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format..
Jun 20 17:22:53 rackfslot2 systemd[1]: Starting tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format....
Jun 20 17:22:53 rackfslot2 systemd[1]: Started tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format..
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: The system config file '/etc/tedge/system.toml' doesn't exist. Using '/bin/systemctl' as the default service manager
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.227942193Z  INFO Runtime: Started
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.228030655Z  INFO Runtime: Running Signal-Handler-0
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.228080747Z  INFO Runtime: Running HealthMonitorActor-1
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.252049339Z  INFO Runtime: Running MQTT-Bridge-2
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.252193596Z  INFO Runtime: Actor has finished: MQTT-Bridge-2
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.254629661Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to local broker
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.254717808Z  INFO tedge_mqtt_bridge: Bridge cloud connection "local" subscribing to [Filter = c8y/s/ut/#, Qos = AtLeastOnce, Filter = c8y/s/us, Qos = AtLeastOnce, Filter = c8y/s/us/#, Qos = AtLeastOnce, Filter = c8y/t/us/#, Qos = AtLeastOnce, Filter = c8y/q/us/#, Qos = AtLeastOnce, Filter = c8y/c/us/#, Qos = AtLeastOnce, Filter = c8y/s/uc/#, Qos = AtLeastOnce, Filter = c8y/t/uc/#, Qos = AtLeastOnce, Filter = c8y/q/uc/#, Qos = AtLeastOnce, Filter = c8y/c/uc/#, Qos = AtLeastOnce, Filter = c8y/inventory/managedObjects/update/#, Qos = AtLeastOnce, Filter = c8y/measurement/measurements/create/#, Qos = AtLeastOnce, Filter = c8y/event/events/create/#, Qos = AtLeastOnce, Filter = c8y/alarm/alarms/create/#, Qos = AtLeastOnce, Filter = c8y/s/uat, Qos = AtLeastOnce]
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.25770768Z  INFO Runtime: Running MQTT-3
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.257806031Z  INFO Runtime: Running C8YJwtRetriever-4
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.257843938Z  INFO Runtime: Running HTTP-5
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.257864586Z  INFO Runtime: Running C8Y-REST-6
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.257885401Z  INFO Runtime: Running C8yAuthProxy-7
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.257907604Z  INFO Runtime: Running FsWatcher-8
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.257934826Z  INFO Runtime: Running Timer-9
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.2582241Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=Some("tedge-mapper-c8y")
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.258314025Z  INFO C8Y-REST: start initialisation
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.258454857Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.259033258Z  INFO HTTP Server: Cumulocity proxy has HTTPS disabled (configured in `c8y.proxy.cert_path`/`c8y.proxy.key_path`) and certificate authentication disabled (configured in `c8y.proxy.ca_path`)
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.259071535Z  INFO c8y_auth_proxy::server: Launching on port 8001 with HTTP
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.261873299Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.262224461Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.263520113Z  INFO tedge_api::entity_store: Loading the entity store from the log
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.263669074Z  INFO tedge_api::entity_store: Finished loading the entity store from the log
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.263819443Z  INFO Runtime: Running CumulocityMapper-10
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.263875016Z  INFO Runtime: Running MQTT-11
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.263900257Z  INFO Runtime: Running Uploader-12
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.26392583Z  INFO Runtime: Running Downloader-13
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.263951534Z  INFO Runtime: Running OldAgentAdapter-14
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.26402144Z  INFO c8y_mapper_ext::inventory: Could not read inventory fragments from file "/etc/tedge/device/inventory.json"
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.264143976Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=Some("last_will_c8y_mapper")
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.265645811Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.284181632Z  INFO c8y_api::http_proxy: JWT token requested
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.359685403Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jun 20 17:22:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:53.359765606Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jun 20 17:22:54 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:54.839198292Z  INFO c8y_api::http_proxy: JWT token received
Jun 20 17:22:54 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:54.839690323Z  INFO mqtt_channel::connection: MQTT connection closed
Jun 20 17:22:54 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:54.928795899Z  INFO C8Y-REST: initialisation done.
Jun 20 17:22:55 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:55.09824282Z  INFO tedge_downloader_ext::actor: Downloading from url http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-benchmark-c8y-mapper-10719676 to location /tmp/.tmpi4dZ8t/tedge-benchmark-c8y-mapper-10719676
Jun 20 17:22:55 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:55.104853817Z  INFO download::download: Downloading file from url="http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-benchmark-c8y-mapper-10719676", len=0
Jun 20 17:22:55 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:55.227900018Z  INFO tedge_uploader_ext::actor: Uploading from /tmp/.tmpi4dZ8t/tedge-benchmark-c8y-mapper-10719676 to url: http://127.0.0.1:8001/c8y/event/events/10721033/binaries
Jun 20 17:22:55 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:55.23419463Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jun 20 17:22:55 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:55.235024472Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 17:22:55 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:55.256439668Z  INFO c8y_api::http_proxy: JWT token requested
Jun 20 17:22:55 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:55.271527492Z  INFO c8y_api::http_proxy: JWT token received
Jun 20 17:22:55 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:22:55.271820971Z  INFO mqtt_channel::connection: MQTT connection closed
Jun 20 17:23:38 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:23:38.32350921Z  INFO c8y_auth_proxy::server: Closed websocket proxy from Cumulocity to client
Jun 20 17:23:38 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:23:38.323627745Z  INFO c8y_auth_proxy::server: Closed websocket proxy from client to Cumulocity
Jun 20 17:34:59 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:34:59.527175156Z  INFO tedge_downloader_ext::actor: Downloading from url http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10719678 to location /tmp/.tmpROm28s/tedge-bridge-monitor-c8y-mapper-10719678
Jun 20 17:34:59 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:34:59.539579641Z  INFO download::download: Downloading file from url="http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10719678", len=0
Jun 20 17:34:59 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:34:59.805034646Z  INFO tedge_uploader_ext::actor: Uploading from /tmp/.tmpROm28s/tedge-bridge-monitor-c8y-mapper-10719678 to url: http://127.0.0.1:8001/c8y/event/events/10722429/binaries
Jun 20 17:35:14 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:35:14.168481808Z  INFO c8y_auth_proxy::server: Closed websocket proxy from Cumulocity to client
Jun 20 17:35:14 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:35:14.168642325Z  INFO c8y_auth_proxy::server: Closed websocket proxy from client to Cumulocity
Jun 20 17:36:20 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:36:20.442482192Z  INFO c8y_auth_proxy::server: Closed websocket proxy from Cumulocity to client
Jun 20 17:36:20 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:36:20.442756818Z  INFO c8y_auth_proxy::server: Closed websocket proxy from client to Cumulocity
Jun 20 17:43:48 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:43:48.531968053Z  INFO tedge_downloader_ext::actor: Downloading from url http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10719685 to location /tmp/.tmpU0U5jS/tedge-bridge-monitor-c8y-mapper-10719685
Jun 20 17:43:48 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:43:48.537914947Z  INFO download::download: Downloading file from url="http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10719685", len=0
Jun 20 17:43:48 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:43:48.864344056Z  INFO tedge_uploader_ext::actor: Uploading from /tmp/.tmpU0U5jS/tedge-bridge-monitor-c8y-mapper-10719685 to url: http://127.0.0.1:8001/c8y/event/events/10721057/binaries
Jun 20 17:49:09 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:49:09.700932928Z  INFO c8y_auth_proxy::server: Closed websocket proxy from Cumulocity to client
Jun 20 17:49:09 rackfslot2 tedge-mapper[161550]: 2024-06-20T15:49:09.701162611Z  INFO c8y_auth_proxy::server: Closed websocket proxy from client to Cumulocity
Jun 20 19:20:11 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:11.946831011Z  INFO tedge_downloader_ext::actor: Downloading from url http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10720145 to location /tmp/.tmphA209Z/tedge-bridge-monitor-c8y-mapper-10720145
Jun 20 19:20:11 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:11.953721892Z  INFO download::download: Downloading file from url="http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10720145", len=0
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.365733179Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.366532781Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.388017398Z  INFO c8y_api::http_proxy: JWT token requested
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.419273048Z  INFO c8y_api::http_proxy: JWT token received
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.41962021Z  INFO mqtt_channel::connection: MQTT connection closed
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.494457975Z  INFO tedge_uploader_ext::actor: Uploading from /tmp/.tmphA209Z/tedge-bridge-monitor-c8y-mapper-10720145 to url: http://127.0.0.1:8001/c8y/event/events/10726885/binaries
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.549364809Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.550023468Z  INFO mqtt_channel::connection: MQTT connection established
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.573671707Z  INFO c8y_api::http_proxy: JWT token requested
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.587474044Z  INFO c8y_api::http_proxy: JWT token received
Jun 20 19:20:12 rackfslot2 tedge-mapper[161550]: 2024-06-20T17:20:12.588204202Z  INFO mqtt_channel::connection: MQTT connection closed
Jun 20 21:03:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T19:03:53.618862846Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Io error: Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }
Jun 20 21:03:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T19:03:53.618988363Z  INFO tedge_mqtt_bridge: Waiting 27.642985678s until attempting reconnection to cloud broker
Jun 20 21:04:22 rackfslot2 tedge-mapper[161550]: 2024-06-20T19:04:22.152490306Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jun 20 21:04:22 rackfslot2 tedge-mapper[161550]: 2024-06-20T19:04:22.152604824Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jun 20 23:47:14 rackfslot2 tedge-mapper[161550]: 2024-06-20T21:47:14.194441041Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Io error: Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }
Jun 20 23:47:14 rackfslot2 tedge-mapper[161550]: 2024-06-20T21:47:14.194537799Z  INFO tedge_mqtt_bridge: Waiting 27.988953577s until attempting reconnection to cloud broker
Jun 20 23:47:42 rackfslot2 tedge-mapper[161550]: 2024-06-20T21:47:42.684690841Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jun 20 23:47:42 rackfslot2 tedge-mapper[161550]: 2024-06-20T21:47:42.684805784Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jun 20 23:57:14 rackfslot2 tedge-mapper[161550]: 2024-06-20T21:57:14.191316602Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
Jun 20 23:57:14 rackfslot2 tedge-mapper[161550]: 2024-06-20T21:57:14.191429174Z  INFO tedge_mqtt_bridge: Waiting 31.295359722s until attempting reconnection to cloud broker
Jun 20 23:57:45 rackfslot2 tedge-mapper[161550]: 2024-06-20T21:57:45.770261234Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jun 20 23:57:45 rackfslot2 tedge-mapper[161550]: 2024-06-20T21:57:45.770369399Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jun 21 00:00:01 rackfslot2 tedge-mapper[161550]: 2024-06-20T22:00:01.97419309Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
Jun 21 00:00:01 rackfslot2 tedge-mapper[161550]: 2024-06-20T22:00:01.974285422Z  INFO tedge_mqtt_bridge: Waiting 51.48150879s until attempting reconnection to cloud broker
Jun 21 00:00:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T22:00:53.767722196Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jun 21 00:00:53 rackfslot2 tedge-mapper[161550]: 2024-06-20T22:00:53.767827714Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jun 24 15:06:07 rackfslot2 tedge-mapper[161550]: 2024-06-24T13:06:07.165885116Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
Jun 24 15:06:07 rackfslot2 tedge-mapper[161550]: 2024-06-24T13:06:07.169332687Z  INFO tedge_mqtt_bridge: Waiting 19.892776233s until attempting reconnection to cloud broker
Jun 24 15:06:27 rackfslot2 tedge-mapper[161550]: 2024-06-24T13:06:27.298274083Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jun 24 15:06:27 rackfslot2 tedge-mapper[161550]: 2024-06-24T13:06:27.298333397Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
reubenmiller commented 4 months ago

The event seems to have occurred again on the device under test, below are the tedge-mapper-c8y logs:

Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.065543723Z  INFO mqtt_channel::connection: MQTT connection established
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.087623181Z  INFO c8y_api::http_proxy: JWT token requested
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.103873283Z  INFO c8y_api::http_proxy: JWT token received
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.104461628Z  INFO mqtt_channel::connection: MQTT connection closed
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.154844303Z  INFO tedge_uploader_ext::actor: Uploading from /tmp/.tmpRAJ3RK/:etc:tedge:tedge.toml-c8y-mapper-10882415 to url: http://127.0.0.1:8001/c8y/event/events/10883500/binaries
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.528857274Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.530204314Z  INFO mqtt_channel::connection: MQTT connection established
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.552273457Z  INFO c8y_api::http_proxy: JWT token requested
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.568016047Z  INFO c8y_api::http_proxy: JWT token received
Jul 01 13:02:10 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:10.56858441Z  INFO mqtt_channel::connection: MQTT connection closed
Jul 01 13:02:37 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:37.61500343Z  INFO tedge_downloader_ext::actor: Downloading from url http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10882417 to location /tmp/.tmpvpwa7N/tedge-bridge-monitor-c8y-mapper-10882417
Jul 01 13:02:37 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:37.625846911Z  INFO download::download: Downloading file from url="http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/log_upload/tedge-bridge-monitor-c8y-mapper-10882417", len=0
Jul 01 13:02:37 rackfslot2 tedge-mapper[2219788]: 2024-07-01T11:02:37.646764105Z  INFO tedge_uploader_ext::actor: Uploading from /tmp/.tmpvpwa7N/tedge-bridge-monitor-c8y-mapper-10882417 to url: http://127.0.0.1:8001/c8y/event/events/10880045/binaries
Jul 04 08:28:11 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:28:11.625840188Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Last pingreq isn't acked
Jul 04 08:28:11 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:28:11.625985112Z  INFO tedge_mqtt_bridge: Waiting 25.97075156s until attempting reconnection to cloud broker
Jul 04 08:28:37 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:28:37.72420315Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jul 04 08:28:37 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:28:37.724326852Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jul 04 08:37:14 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:37:14.163169576Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
Jul 04 08:37:14 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:37:14.163277593Z  INFO tedge_mqtt_bridge: Waiting 15.633064796s until attempting reconnection to cloud broker
Jul 04 08:37:30 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:37:30.041670265Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jul 04 08:37:30 rackfslot2 tedge-mapper[2219788]: 2024-07-04T06:37:30.041784134Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]
Jul 04 09:00:01 rackfslot2 tedge-mapper[2219788]: 2024-07-04T07:00:01.684107842Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: I/O: Broken pipe (os error 32)
Jul 04 09:00:01 rackfslot2 tedge-mapper[2219788]: 2024-07-04T07:00:01.68473139Z  INFO tedge_mqtt_bridge: Waiting 44.01264048s until attempting reconnection to cloud broker
Jul 04 09:00:45 rackfslot2 tedge-mapper[2219788]: 2024-07-04T07:00:45.947290995Z  INFO tedge_mqtt_bridge::health: MQTT bridge connected to cloud broker
Jul 04 09:00:45 rackfslot2 tedge-mapper[2219788]: 2024-07-04T07:00:45.94742066Z  INFO tedge_mqtt_bridge: Bridge cloud connection "cloud" subscribing to [Filter = s/dt, Qos = AtLeastOnce, Filter = s/dat, Qos = AtLeastOnce, Filter = s/ds, Qos = AtLeastOnce, Filter = s/e, Qos = AtLeastOnce, Filter = devicecontrol/notifications, Qos = AtLeastOnce, Filter = error, Qos = AtLeastOnce]

And checking with the tedge connect c8y --test also seems to confirm that the connection to the cloud is now longer there:

# tedge connect c8y --test
The system config file '/etc/tedge/system.toml' doesn't exist. Using '/bin/systemctl' as the default service manager

Detected mosquitto version >= 2.0.0
Sending packets to check connection. This may take up to 2 seconds.

ERROR: Local MQTT publish has timed out.
Error: failed to test connection to Cumulocity cloud.

Caused by:
    Operation timed out. Is mosquitto running?
jarhodes314 commented 4 months ago

The issue is now fixed in #2982, but I am detailing the issue here so it's easy to find the root cause if we come across a similar issue in future.

In order to link messages to packet IDs so we can acknowledge the incoming message when we receive an acknowledgement for the forwarded message, we have a channel between the two bridge halves. When publishing a message, we send the Publish struct to the other bridge half:

https://github.com/thin-edge/thin-edge.io/blob/4d780acc0862997fc73797982f7c7d4174afd725/crates/extensions/tedge_mqtt_bridge/src/lib.rs#L347-L359

And when the other bridge half receives the Publish event from rumqttc with the packet ID, we await the receipt of this Publish struct:

https://github.com/thin-edge/thin-edge.io/blob/4d780acc0862997fc73797982f7c7d4174afd725/crates/extensions/tedge_mqtt_bridge/src/lib.rs#L376-L381

After a disconnection, however, any in-flight (i.e. unacknowledged) messages are re-published by rumqttc, so we receive another Publish event and wait on companion_bridge_half.recv(), but there is no corresponding companion_bridge_half.send(), so this blocks indefinitely.

The solution to this issue is to detect these duplicate publishes and ignore them. Since the packet ID remains the same for the original and republished message, all that is needed here is to check whether the ID is in forwarded_pkid_to_received_msg and if so, skip trying to receive the message from the companion bridge half.

reubenmiller commented 4 months ago

There has been a second incident on the osadl-rackfslot2 device, where the mapper has stopped sending measurements and left an operation in the EXECUTING state. The incident seems to be triggered by requesting a config_snapshot from the device via Cumulocity IoT, where the tedge-mapper-c8y failed to get a JWT which occurred at ~9:38 local time.

Below are the logs collected from the device.

# journalctl -fu tedge-mapper-c8y
Jul 25 09:39:39 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:39:39.434975965Z  INFO mqtt_channel::connection: MQTT connection closed
Jul 25 09:39:39 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:39:39.637275725Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jul 25 09:39:39 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:39:39.638372879Z  INFO mqtt_channel::connection: MQTT connection established
Jul 25 09:39:39 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:39:39.659390596Z  INFO c8y_api::http_proxy: JWT token requested
Jul 25 09:39:59 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:39:59.660836141Z  INFO c8y_api::http_proxy: JWT token requested
Jul 25 09:40:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:00.520462242Z DEBUG tedge_mqtt_bridge: Received notification (cloud) Outgoing(PingReq)
Jul 25 09:40:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:00.520578629Z DEBUG tedge_mqtt_bridge: Processed notification (cloud) Outgoing(PingReq)
Jul 25 09:40:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:00.532787912Z DEBUG tedge_mqtt_bridge: Received notification (cloud) Incoming(PingResp)
Jul 25 09:40:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:00.53286517Z DEBUG tedge_mqtt_bridge: Processed notification (cloud) Incoming(PingResp)
Jul 25 09:40:19 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:19.661910165Z  INFO c8y_api::http_proxy: JWT token requested
Jul 25 09:40:39 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:39.663362825Z ERROR c8y_api::http_proxy: Fail to retrieve JWT token after 3 attempts
Jul 25 09:40:39 rackfslot2 tedge-mapper[1591753]: thread 'tokio-runtime-worker' panicked at crates/extensions/c8y_auth_proxy/src/tokens.rs:44:72:
Jul 25 09:40:39 rackfslot2 tedge-mapper[1591753]: called `Result::unwrap()` on an `Err` value: NoJwtReceived
Jul 25 09:40:39 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:39.664393276Z  INFO mqtt_channel::connection: MQTT connection closed
Jul 25 09:40:39 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:39.666860673Z  WARN upload::upload: Temporary failure: error sending request for url (http://127.0.0.1:8001/c8y/event/events/11358344/binaries): connection closed before message completed. Retrying in 14s
Jul 25 09:40:54 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:54.121150124Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jul 25 09:40:54 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:54.122187205Z  INFO mqtt_channel::connection: MQTT connection established
Jul 25 09:40:54 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:40:54.143851544Z  INFO c8y_api::http_proxy: JWT token requested
Jul 25 09:41:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:00.520909789Z DEBUG tedge_mqtt_bridge: Received notification (cloud) Outgoing(PingReq)
Jul 25 09:41:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:00.521029751Z DEBUG tedge_mqtt_bridge: Processed notification (cloud) Outgoing(PingReq)
Jul 25 09:41:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:00.532862816Z DEBUG tedge_mqtt_bridge: Received notification (cloud) Incoming(PingResp)
Jul 25 09:41:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:00.532966111Z DEBUG tedge_mqtt_bridge: Processed notification (cloud) Incoming(PingResp)
Jul 25 09:41:14 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:14.144475845Z  INFO c8y_api::http_proxy: JWT token requested
Jul 25 09:41:34 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:34.145688763Z  INFO c8y_api::http_proxy: JWT token requested
Jul 25 09:41:54 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:54.146814082Z ERROR c8y_api::http_proxy: Fail to retrieve JWT token after 3 attempts
Jul 25 09:41:54 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:54.147141152Z  INFO mqtt_channel::connection: MQTT connection closed
Jul 25 09:41:54 rackfslot2 tedge-mapper[1591753]: thread 'tokio-runtime-worker' panicked at crates/extensions/c8y_auth_proxy/src/tokens.rs:44:72:
Jul 25 09:41:54 rackfslot2 tedge-mapper[1591753]: called `Result::unwrap()` on an `Err` value: NoJwtReceived
Jul 25 09:41:54 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:54.34620197Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=127.0.0.1:1883, session_name=None
Jul 25 09:41:54 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:54.347374438Z  INFO mqtt_channel::connection: MQTT connection established
Jul 25 09:41:54 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:41:54.369155183Z  INFO c8y_api::http_proxy: JWT token requested
Jul 25 09:42:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:42:00.523963967Z DEBUG tedge_mqtt_bridge: Received notification (cloud) Outgoing(PingReq)
Jul 25 09:42:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:42:00.524082873Z DEBUG tedge_mqtt_bridge: Processed notification (cloud) Outgoing(PingReq)
Jul 25 09:42:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:42:00.704925608Z DEBUG tedge_mqtt_bridge: Received notification (cloud) Incoming(PingResp)
Jul 25 09:42:00 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:42:00.705034255Z DEBUG tedge_mqtt_bridge: Processed notification (cloud) Incoming(PingResp)
Jul 25 09:42:14 rackfslot2 tedge-mapper[1591753]: 2024-07-25T07:42:14.370257125Z  INFO c8y_api::http_proxy: JWT token requested
# tedge mqtt sub '#'
INFO: Connected
[te/device/main/service/c8y-firmware-plugin] {"@id":"osadl-rackfslot2:device:main:service:c8y-firmware-plugin","@parent":"device/main//","@type":"service","name":"c8y-firmware-plugin","type":"service"}
[te/device/main/service/c8y-firmware-plugin/status/health] {"pid":156187,"status":"up","time":1721233775.4812958}
[te/device/main/service/mosquitto-c8y-bridge] {"@id":"osadl-rackfslot2:device:main:service:mosquitto-c8y-bridge","@parent":"device/main//","@type":"service","name":"mosquitto-c8y-bridge","type":"service"}
[te/device/main/service/mosquitto-c8y-bridge/status/health] 0
[te/device/main/service/tedge-mapper-c8y] {"@id":"osadl-rackfslot2:device:main:service:tedge-mapper-c8y","@parent":"device/main//","@type":"service","name":"tedge-mapper-c8y","type":"service"}
[te/device/main/service/tedge-mapper-c8y/status/health] {"pid":1591753,"status":"up","time":1721233805.716494}
[te/device/main/service/tedge-agent] {"@parent":"device/main//","@type":"service","type":"service"}
[te/device/main/service/tedge-agent/status/health] {"pid":1591718,"status":"up","time":1721233783.8334885}
[te/device/main/service/tedge-mapper-collectd] {"@parent":"device/main//","@type":"service","type":"service"}
[te/device/main/service/tedge-mapper-collectd/status/health] {"pid":1591745,"status":"up","time":1721233801.9732199}
[te/device/main/service/tedge-mapper-bridge-c8y] {"@id":"osadl-rackfslot2:device:main:service:tedge-mapper-bridge-c8y","@parent":"device/main//","@type":"service","name":"tedge-mapper-bridge-c8y","type":"service"}
[te/device/main/service/tedge-mapper-bridge-c8y/status/health] {"status":"down"}
[te/device/main///cmd/config_snapshot] {"types":["/etc/tedge/tedge.toml","c8y-bridge.conf","tedge-configuration-plugin","tedge-log-plugin","tedge-mosquitto.conf"]}
[te/device/main///cmd/config_snapshot/c8y-mapper-11361151] {"logPath":"/var/log/tedge/agent/workflow-config_snapshot-c8y-mapper-11361151.log","path":"/etc/tedge/tedge.toml","status":"successful","tedgeUrl":"http://127.0.0.1:8000/tedge/file-transfer/osadl-rackfslot2/config_snapshot/:etc:tedge:tedge.toml-c8y-mapper-11361151","type":"/etc/tedge/tedge.toml"}
[te/device/main///cmd/config_update] {"types":["/etc/tedge/tedge.toml","c8y-bridge.conf","tedge-configuration-plugin","tedge-log-plugin","tedge-mosquitto.conf"]}
[te/device/main///cmd/log_upload] {"types":["auth","cron","kern","mosquitto","software-management","system","tedge-benchmark","tedge-bridge-monitor"]}
[te/device/main///cmd/restart] {}
[te/device/main///cmd/software_list] {"types":["apt"]}
[te/device/main///cmd/software_update] {"types":["apt"]}
[te/device/main///twin/c8y_Agent] {"name":"thin-edge.io","url":"https://thin-edge.io","version":"1.1.2~201+g5886e5d"}

bridge monitor logs

{"tst":"2024-07-16T02:10:30.420483+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-16T02:11:01.683226+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-17T16:23:54.884758+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T16:23:55.946158+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":1,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T16:24:19.457445+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T16:24:19.502298+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-17T16:24:43.985536+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T16:24:45.549973+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-17T18:29:34.465586+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T18:29:35.526201+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":1,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T18:29:52.113442+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T18:29:52.113589+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-17T18:30:05.629639+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-17T18:30:05.840028+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-18T10:43:06.028564+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}
{"tst":"2024-07-18T10:43:47.660542+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":15,"payload":"{\"status\":\"up\"}"}
{"tst":"2024-07-25T09:40:09.976420+0200","topic":"te/device/main/service/tedge-mapper-bridge-c8y/status/health","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"down\"}"}

tedge-mapper-c8y systemd status

$ systemctl status tedge-mapper-c8y
● tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format.
     Loaded: loaded (/lib/systemd/system/tedge-mapper-c8y.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/tedge-mapper-c8y.service.d
             └─override.conf
     Active: active (running) since Wed 2024-07-17 18:30:05 CEST; 1 week 0 days ago
    Process: 1591752 ExecStartPre=/usr/bin/tedge init (code=exited, status=0/SUCCESS)
   Main PID: 1591753 (tedge-mapper)
      Tasks: 7 (limit: 1583)
     Memory: 4.3M
        CPU: 17min 901ms
     CGroup: /system.slice/tedge-mapper-c8y.service
             └─1591753 /usr/bin/tedge-mapper c8y

The Cumulocity IoT Core's logs shows the following error log entry (cumulocity-core/c8ycore-sts-4:cumulocity-core):

2024-07-25 09:38:39.191 SessionAuth: Tenant's 'qaenvironment' user 'device_osadl-rackfslot2', 'f5aa4e4b-4434-4821-9570-1d9fc853aee0' sessions deleted.
reubenmiller commented 4 months ago

I've found an explanation for the last reported event where the built-in bridge was marked as down...

There is a problem with the handling of QoS=0 messages where the package identifier (pkid) is 0. This results in the following problems:

The scenario can be reproduced, by executing the following command about 10 times, after 8-9 calls, the command starts to fail and it does not recover until you restart the tedge-mapper-c8y service (though I didn't wait that long before restarting the service):

tedge connect c8y --test

The following is the system test and proposed "fix" (pending Rust review):

reubenmiller commented 3 weeks ago

Closing the ticket as the built-in mapper seems to have stabilized and has been running for 1.5 months and is still functional (on the osadl2 device, with tedge 1.3.0).

Below is the service status showing the runtime:

root@rackfslot2:~# systemctl status tedge-mapper-c8y
● tedge-mapper-c8y.service - tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format.
     Loaded: loaded (/lib/systemd/system/tedge-mapper-c8y.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/tedge-mapper-c8y.service.d
             └─override.conf
     Active: active (running) since Fri 2024-09-20 15:26:46 CEST; 1 month 14 days ago
   Main PID: 110720 (tedge-mapper)
      Tasks: 7 (limit: 1583)
     Memory: 7.9M
        CPU: 1h 38min 37.309s
     CGroup: /system.slice/tedge-mapper-c8y.service
             └─110720 /usr/bin/tedge-mapper c8y