Closed reubenmiller closed 8 months ago
After investigating, this is a bug in the tedge-agent (not the test). It is related to https://github.com/thin-edge/thin-edge.io/issues/2501
The following contains log entries which show the missing 503
message which results in the software update operation remaining in the EXECUTING state after the agent has performed the update.
Case: Failed
In the failed case, the 503,c8y_SoftwareUpdate
message is not sent before the tedge-agent process exits, and the process does not attempt to send a message when it starts up.
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.381178295Z INFO tedge_agent::software_manager::actor: Checking if tedge got self updated
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.381197381Z INFO tedge_agent::software_manager::actor: Current running version: 0.0.1
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.38183023Z INFO tedge_agent::software_manager::actor: Installed binary version: 1.0.0-rc.2~124+ge534ed3
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.381843875Z INFO tedge_agent::software_manager::actor: Self update detected. Requesting shutdown...
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.381848895Z ERROR tedge_agent::software_manager::actor: Tedge-agent is no more running the latest-version => a restart is required
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.381893579Z ERROR Runtime: Actor SoftwareManagerActor-11 has finished unsuccessfully: ActorError(NotRunningLatestVersion)
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.381903047Z INFO Runtime: Shutting down on error: Tedge-agent is not running the latest version
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.381959553Z INFO Runtime: Actor has finished: FsWatcher-5
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382056465Z INFO tedge_agent::file_transfer_server::actor: Shutdown
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382100728Z INFO mqtt_channel::connection: MQTT connection closed
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Received PUBLISH from tedge-agent#te/device/main// (d0, q1, r1, m17, 'te/device/main/service/tedge-agent/status/health', ... (27 bytes))
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Sending PUBLISH to tedge-mapper-c8y (d0, q1, r0, m64, 'te/device/main/service/tedge-agent/status/health', ... (27 bytes))
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382205815Z INFO Runtime: Actor has finished: MQTT-4
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382216245Z INFO Runtime: Actor has finished: Signal-Handler-3
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382222677Z INFO Runtime: Actor has finished: HealthMonitorActor-14
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382227977Z INFO Runtime: Actor has finished: TedgetoTeConverter-0
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382233046Z INFO Runtime: Actor has finished: LogManager-9
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382244037Z INFO Runtime: Actor has finished: Downloader-6
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382249287Z INFO Runtime: Actor has finished: RestartManagerActor-10
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382254286Z INFO Runtime: Actor has finished: ConfigManager-8
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382259045Z INFO Runtime: Actor has finished: Script-12
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382264005Z INFO Runtime: Actor has finished: Uploader-7
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Sending PUBLISH to auto-65FF3E37-8DCB-5F97-81E3-35C7AFC6ADB9 (d0, q0, r0, m0, 'te/device/main/service/tedge-agent/status/health', ... (27 bytes))
Jan 25 08:33:27 03d75d9284e7 mqtt-logger[428]: {"timestamp":1706171607.382505558,"message":{"tst":"2024-01-25T08:33:27.382522+0000","topic":"te/device/main/service/tedge-agent/status/health","qos":0,"retain":0,"payloadlen":27,"payload":"{\"pid\":518,\"status\":\"down\"}"},"payload_hex":"7b22706964223a3531382c22737461747573223a22646f776e227d"}
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382268823Z INFO Runtime: Actor has finished: HttpFileTransferServer-1
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.382273983Z INFO Runtime: Actor has finished: FileCacheActor-2
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Sending PUBACK to tedge-agent#te/device/main// (m17, rc0)
Jan 25 08:33:27 03d75d9284e7 mqtt-logger[428]: {"timestamp":1706171607.382617468,"message":{"tst":"2024-01-25T08:33:27.382626+0000","topic":"c8y/s/us/TST_concentrate_warm_wave:device:main:service:tedge-agent","qos":0,"retain":0,"payloadlen":8,"payload":"104,down"},"payload_hex":"3130342c646f776e"}
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Received DISCONNECT from tedge-agent#te/device/main//
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Client tedge-agent#te/device/main// disconnected.
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Received PUBACK from tedge-mapper-c8y (Mid: 64, RC:0)
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Received PUBLISH from tedge-mapper-c8y (d0, q1, r0, m24, 'c8y/s/us/TST_concentrate_warm_wave:device:main:service:tedge-agent', ... (8 bytes))
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Sending PUBLISH to Cumulocity (d0, q1, r0, m71, 's/us/TST_concentrate_warm_wave:device:main:service:tedge-agent', ... (8 bytes))
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Sending PUBLISH to auto-65FF3E37-8DCB-5F97-81E3-35C7AFC6ADB9 (d0, q0, r0, m0, 'c8y/s/us/TST_concentrate_warm_wave:device:main:service:tedge-agent', ... (8 bytes))
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Sending PUBACK to tedge-mapper-c8y (m24, rc0)
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.409678889Z ERROR Runtime: Actor TedgeOperationConverter-13 has finished unsuccessfully: ChannelError(SendError(SendError { kind: Disconnected }))
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.409694438Z INFO Runtime: All actors have finished
Jan 25 08:33:27 03d75d9284e7 tedge-agent[518]: 2024-01-25T08:33:27.409915012Z ERROR sm-agent: tedge_actors::runtime: Aborted due to Tedge-agent is not running the latest version
Jan 25 08:33:27 03d75d9284e7 systemd[1]: tedge-agent.service: Main process exited, code=exited, status=1/FAILURE
Jan 25 08:33:27 03d75d9284e7 systemd[1]: tedge-agent.service: Failed with result 'exit-code'.
Jan 25 08:33:27 03d75d9284e7 mosquitto[1185]: 1706171607: Received PUBACK from Cumulocity (Mid: 71, RC:0)
Jan 25 08:33:32 03d75d9284e7 systemd[1]: tedge-agent.service: Scheduled restart job, restart counter is at 1.
Case: Successful
In the succesful case, the 503,c8y_SoftwareUpdate
message is being published before the tedge-agent process exits.
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.583704052Z INFO tedge_agent::software_manager::actor: Current running version: 0.0.1
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBLISH from tedge-agent#te/device/main// (d0, q1, r1, m17, 'te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z', ... (442 bytes))
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.623880614Z INFO tedge_agent::tedge_operation_converter::actor: Waiting successful software_update operation to be cleared
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to tedge-mapper-c8y (d0, q1, r0, m63, 'te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z', ... (442 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to tedge-agent#te/device/main// (d0, q1, r0, m11, 'te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z', ... (442 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to auto-32D5F4C4-6D06-1993-F568-0E105D61F996 (d0, q0, r0, m0, 'te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z', ... (442 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBACK to tedge-agent#te/device/main// (m17, rc0)
Jan 25 16:11:08 9036ddba237f mqtt-logger[623]: {"timestamp":1706199068.643524206,"message":{"tst":"2024-01-25T16:11:08.643678+0000","topic":"te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z","qos":0,"retain":0,"payloadlen":442,"payload":"{\"status\":\"successful\",\"updateList\":[{\"modules\":[{\"action\":\"install\",\"name\":\"tedge\",\"version\":\"0.13.2~405+gb83d18b\"},{\"action\":\"install\",\"name\":\"tedge-mapper\",\"version\":\"0.13.2~405+gb83d18b\"},{\"action\":\"install\",\"name\":\"tedge-agent\",\"version\":\"0.13.2~405+gb83d18b\"},{\"action\":\"install\",\"name\":\"tedge-watchdog\",\"version\":\"0.13.2~405+gb83d18b\"},{\"action\":\"install\",\"name\":\"tedge-apt-plugin\",\"version\":\"0.13.2~405+gb83d18b\"}],\"type\":\"default\"}]}"},"payload_hex":"7b22737461747573223a227375636365737366756c222c227570646174654c697374223a5b7b226d6f64756c6573223a5b7b22616374696f6e223a22696e7374616c6c222c226e616d65223a227465646765222c2276657273696f6e223a22302e31332e327e3430352b6762383364313862227d2c7b22616374696f6e223a22696e7374616c6c222c226e616d65223a2274656467652d6d6170706572222c2276657273696f6e223a22302e31332e327e3430352b6762383364313862227d2c7b22616374696f6e223a22696e7374616c6c222c226e616d65223a2274656467652d6167656e74222c2276657273696f6e223a22302e31332e327e3430352b6762383364313862227d2c7b22616374696f6e223a22696e7374616c6c222c226e616d65223a2274656467652d7761746368646f67222c2276657273696f6e223a22302e31332e327e3430352b6762383364313862227d2c7b22616374696f6e223a22696e7374616c6c222c226e616d65223a2274656467652d6170742d706c7567696e222c2276657273696f6e223a22302e31332e327e3430352b6762383364313862227d5d2c2274797065223a2264656661756c74227d5d7d"}
Jan 25 16:11:08 9036ddba237f mqtt-logger[623]: {"timestamp":1706199068.647093299,"message":{"tst":"2024-01-25T16:11:08.647150+0000","topic":"c8y/s/us","qos":0,"retain":0,"payloadlen":22,"payload":"503,c8y_SoftwareUpdate"},"payload_hex":"3530332c6338795f536f667477617265557064617465"}
Jan 25 16:11:08 9036ddba237f mqtt-logger[623]: {"timestamp":1706199068.647259799,"message":{"tst":"2024-01-25T16:11:08.647278+0000","topic":"te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z","qos":0,"retain":0,"payloadlen":0,"payload":null},"payload_hex":""}
Jan 25 16:11:08 9036ddba237f mqtt-logger[623]: {"timestamp":1706199068.647297716,"message":{"tst":"2024-01-25T16:11:08.647344+0000","topic":"te/device/main///cmd/software_list/c8y-mapper-2024-01-25T16:11:08.64443525Z","qos":0,"retain":0,"payloadlen":17,"payload":"{\"status\":\"init\"}"},"payload_hex":"7b22737461747573223a22696e6974227d"}
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from tedge-mapper-c8y (Mid: 63, RC:0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from tedge-agent#te/device/main// (Mid: 11, RC:0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBLISH from tedge-mapper-c8y (d0, q1, r0, m23, 'c8y/s/us', ... (22 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to Cumulocity (d0, q1, r0, m70, 's/us', ... (22 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to auto-32D5F4C4-6D06-1993-F568-0E105D61F996 (d0, q0, r0, m0, 'c8y/s/us', ... (22 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBACK to tedge-mapper-c8y (m23, rc0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBLISH from tedge-mapper-c8y (d0, q1, r1, m24, 'te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z', ... (0 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to tedge-mapper-c8y (d0, q1, r0, m65, 'te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z', ... (0 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to tedge-agent#te/device/main// (d0, q1, r0, m12, 'te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z', ... (0 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to auto-32D5F4C4-6D06-1993-F568-0E105D61F996 (d0, q0, r0, m0, 'te/device/main///cmd/software_update/c8y-mapper-2024-01-25T16:10:05.977029922Z', ... (0 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBACK to tedge-mapper-c8y (m24, rc0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBLISH from tedge-mapper-c8y (d0, q1, r1, m25, 'te/device/main///cmd/software_list/c8y-mapper-2024-01-25T16:11:08.64443525Z', ... (17 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to tedge-mapper-c8y (d0, q1, r0, m67, 'te/device/main///cmd/software_list/c8y-mapper-2024-01-25T16:11:08.64443525Z', ... (17 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to tedge-agent#te/device/main// (d0, q1, r0, m13, 'te/device/main///cmd/software_list/c8y-mapper-2024-01-25T16:11:08.64443525Z', ... (17 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to auto-32D5F4C4-6D06-1993-F568-0E105D61F996 (d0, q0, r0, m0, 'te/device/main///cmd/software_list/c8y-mapper-2024-01-25T16:11:08.64443525Z', ... (17 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBACK to tedge-mapper-c8y (m25, rc0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from tedge-agent#te/device/main// (Mid: 12, RC:0)
Jan 25 16:11:08 9036ddba237f mqtt-logger[623]: {"timestamp":1706199068.744399050,"message":{"tst":"2024-01-25T16:11:08.744445+0000","topic":"te/device/main/service/tedge-agent/status/health","qos":0,"retain":0,"payloadlen":27,"payload":"{\"pid\":714,\"status\":\"down\"}"},"payload_hex":"7b22706964223a3731342c22737461747573223a22646f776e227d"}
Jan 25 16:11:08 9036ddba237f mqtt-logger[623]: {"timestamp":1706199068.744492759,"message":{"tst":"2024-01-25T16:11:08.744497+0000","topic":"tedge/commands/req/software/list","qos":0,"retain":0,"payloadlen":49,"payload":"{\"id\":\"c8y-mapper-2024-01-25T16:11:08.64443525Z\"}"},"payload_hex":"7b226964223a226338792d6d61707065722d323032342d30312d32355431363a31313a30382e36343434333532355a227d"}
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684253645Z INFO tedge_agent::software_manager::actor: Installed binary version: 0.13.2~405+gb83d18b
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684292478Z INFO tedge_agent::software_manager::actor: Self update detected. Requesting shutdown...
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684300728Z ERROR tedge_agent::software_manager::actor: Tedge-agent is no more running the latest-version => a restart is required
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684424437Z ERROR Runtime: Actor SoftwareManagerActor-11 has finished unsuccessfully: ActorError(NotRunningLatestVersion)
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684528895Z INFO Runtime: Shutting down on error: Tedge-agent is not running the latest version
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684645104Z INFO Runtime: Actor has finished: FsWatcher-5
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684850271Z INFO tedge_agent::file_transfer_server::actor: Shutdown
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684872188Z INFO Runtime: Actor has finished: HttpFileTransferServer-1
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.684930272Z INFO Runtime: Actor has finished: RestartManagerActor-10
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.685032897Z INFO Runtime: Actor has finished: HealthMonitorActor-14
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.685053897Z INFO Runtime: Actor has finished: FileCacheActor-2
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.685611565Z INFO Runtime: Actor has finished: Signal-Handler-3
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.685648023Z INFO Runtime: Actor has finished: Downloader-6
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.685671315Z INFO Runtime: Actor has finished: Script-12
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.685690232Z INFO Runtime: Actor has finished: LogManager-9
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from tedge-agent#te/device/main// (Mid: 13, RC:0)
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.685710357Z INFO Runtime: Actor has finished: Uploader-7
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.711210881Z INFO Runtime: Actor has finished: TedgetoTeConverter-0
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.711261381Z INFO Runtime: Actor has finished: ConfigManager-8
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.724579749Z INFO Runtime: Actor has finished: MQTT-4
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.724754999Z INFO mqtt_channel::connection: MQTT connection closed
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from tedge-mapper-c8y (Mid: 65, RC:0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from tedge-mapper-c8y (Mid: 67, RC:0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from Cumulocity (Mid: 70, RC:0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBLISH from tedge-agent#te/device/main// (d0, q1, r1, m18, 'te/device/main/service/tedge-agent/status/health', ... (27 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to tedge-mapper-c8y (d0, q1, r0, m69, 'te/device/main/service/tedge-agent/status/health', ... (27 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to auto-32D5F4C4-6D06-1993-F568-0E105D61F996 (d0, q0, r0, m0, 'te/device/main/service/tedge-agent/status/health', ... (27 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBACK to tedge-agent#te/device/main// (m18, rc0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received DISCONNECT from tedge-agent#te/device/main//
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Client tedge-agent#te/device/main// disconnected.
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBLISH from tedge-mapper-c8y (d0, q1, r0, m26, 'tedge/commands/req/software/list', ... (49 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to auto-32D5F4C4-6D06-1993-F568-0E105D61F996 (d0, q0, r0, m0, 'tedge/commands/req/software/list', ... (49 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBACK to tedge-mapper-c8y (m26, rc0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from tedge-mapper-c8y (Mid: 69, RC:0)
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.750192399Z INFO Runtime: Actor has finished: TedgeOperationConverter-13
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.750259232Z INFO Runtime: All actors have finished
Jan 25 16:11:08 9036ddba237f tedge-agent[714]: 2024-01-25T16:11:08.750389399Z ERROR sm-agent: tedge_actors::runtime: Aborted due to Tedge-agent is not running the latest version
Jan 25 16:11:08 9036ddba237f systemd[1]: tedge-agent.service: Main process exited, code=exited, status=1/FAILURE
Jan 25 16:11:08 9036ddba237f systemd[1]: tedge-agent.service: Failed with result 'exit-code'.
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBLISH from tedge-mapper-c8y (d0, q1, r0, m27, 'c8y/s/us/TST_composite_selfish_aluminum:device:main:service:tedge-agent', ... (8 bytes))
Jan 25 16:11:08 9036ddba237f mqtt-logger[623]: {"timestamp":1706199068.773179833,"message":{"tst":"2024-01-25T16:11:08.773222+0000","topic":"c8y/s/us/TST_composite_selfish_aluminum:device:main:service:tedge-agent","qos":0,"retain":0,"payloadlen":8,"payload":"104,down"},"payload_hex":"3130342c646f776e"}
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to Cumulocity (d0, q1, r0, m71, 's/us/TST_composite_selfish_aluminum:device:main:service:tedge-agent', ... (8 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBLISH to auto-32D5F4C4-6D06-1993-F568-0E105D61F996 (d0, q0, r0, m0, 'c8y/s/us/TST_composite_selfish_aluminum:device:main:service:tedge-agent', ... (8 bytes))
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Sending PUBACK to tedge-mapper-c8y (m27, rc0)
Jan 25 16:11:08 9036ddba237f mosquitto[1380]: 1706199068: Received PUBACK from Cumulocity (Mid: 71, RC:0)
Jan 25 16:11:08 9036ddba237f systemd[1]: tedge-agent.service: Consumed 13.648s CPU time.
Jan 25 16:11:13 9036ddba237f systemd[1]: tedge-agent.service: Scheduled restart job, restart counter is at 1.
The issue is due to a race condition between actors.
software_manager
checks the tedge-agent version and triggers a process restart if a new version have been installed.software_manager
actor to forward the operation status to the tedge_operation converter
which publishes this status over MQTT, the recipient being the c8y mapper.Once the operation status has been sent by the software_manager
to the tedge_operation converter
, both threads are executed concurrently. The bug happens when the shutdown is effective before the status got a chance to be published over MQTT (being in the pending queue of the tedge_operation converter
or of the tedge_mqtt_ext
actor.
A proper fix would be to trigger the agent restart only when received the acknowledgment that the operation status has been actually published over MQTT. However, the current design / code base is not equipped to handle that easily.
A way to mitigate the issue is, on shutdown of the MQTT actor, to publish all the messages already queued for publishing.
However, this is not enough as the race condition can occur only between the software_manager
and the tedge_operation converter
(this is the case in the failing case reported by this ticket).
A quick way to reduce the likelihood of the issue is to delay a bit the agent restart when a self-update has been detected.
QA has thoroughly checked the bug and here are the results:
Describe the bug
There is a flaky system tests:
Failed runs are seen here:
To Reproduce
Expected behavior
Screenshots
Environment (please complete the following information):
Additional context