thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
221 stars 54 forks source link

tedge-mapper-c8y token initialization is blocked sporadically when using QoS 1 #2412

Closed reubenmiller closed 11 months ago

reubenmiller commented 11 months ago

Describe the bug

Strange behaviour of the tedge-mapper-c8y leads to sporadic initialization failure where it is blocked waiting for a jwt (token) from Cumulocity IoT. The behaviour was observed in the tedge-demo-container running tedge 0.13.0.

When the main device container is starting up for the first time, the tedge-mapper-c8y sometimes gets stuck in the initialization because it is unable to successfully request a jwt from Cumulocity. The mapper continuously repeats trying to request a token and is continuously unsuccessful. This blocks all of the tedge-mapper-c8y function.

After doing some debugging it was found that when the tedge-mapper-c8y is stuck in the initialization stage, a manual jwt can still be requested as along as the QoS is set to 0 (and not 1 which is what the tedge-mapper-c8y is using).

Some additional observations:

A test script was written to manually check if requesting a token works or not after a reboot (see "to reproduce" section for details). The following are the results, which show that there seems to be some unexplained negative side-effects when the tedge-mapper-c8y is running and using qos 1.

Case 1: With tedge-mapper-c8y enabled and publishing with QoS 1

Case 2: With tedge-mapper-c8y disabled and publishing with QoS 1

Case 3: With tedge-mapper-c8y enabled and publishing with QoS 0

To Reproduce

Note: The error state (not receiving a token response) only occurs sporadically, these instructions are used to highlight how to identify when it is in the error state. For best results use the tedge-demo-container for testing this scenario:

It is assumed that the following is satisfied before:

  1. Open a new console and subscribe to the Cumulocity topic

    tedge mqtt sub 'c8y/#'
  2. Open a different console and request a token using QoS 1

    tedge mqtt pub 'c8y/s/uat' -q 1

    Observe in the first console that a token is not received on the expected c8y/s/dat topic:

    [c8y/s/uat] 
    [c8y/s/uat] 
    [c8y/s/uat] 
    [c8y/s/uat] 
    [c8y/s/uat] 
    [c8y/s/uat] 
    [c8y/s/uat]
  3. In the same console, publish the same token request message but with Qos 0

    tedge mqtt pub 'c8y/s/uat' '' -q 0

    Observe in the subscription console that a token is now received!

    [c8y/s/dat] 71,<token>

To help with debugging a script was created, though your mileage might vary as it only has been tested in the demo container. It will need modification if it is to run on a real device in order to wait until the "system is up" before running the test.

Note: Please review the script yourself before trying it out as it needs sudo rights to trigger a restart and install a service.

wget -O - https://gist.githubusercontent.com/reubenmiller/b476aec48bad01395e05d2586f88a59b/raw/e7377524237b0d899f78a44878ff0e7468a6fdae/debug-qos-issue.sh > debug-qos-issue.sh
sudo chmod a+x debug-qos-issue.sh
sudo ./debug-qos-issue.sh clean

# initiate tests
sudo ./debug-qos-issue.sh

# wait for a few minutes until the device has finished rebooting ~20 times

# check the results
sudo ./debug-qos-issue.sh

Expected behavior

The tedge-mapper-c8y should be able to reliably request the jwt from Cumulocity.

Screenshots

tedge-mapper-c8y logs

These logs show how the error manifests in the tedge-mapper-c8y:

Nov 04 21:44:00 17a2666b257d tedge-mapper[78]: 2023-11-04T21:44:00.263629762Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=localhost:1883, session_name=None
Nov 04 21:44:00 17a2666b257d tedge-mapper[78]: 2023-11-04T21:44:00.267709303Z  INFO mqtt_channel::connection: MQTT connection established
Nov 04 21:44:00 17a2666b257d tedge-mapper[78]: 2023-11-04T21:44:00.314698303Z  INFO c8y_api::http_proxy: JWT token requested
Nov 04 21:44:20 17a2666b257d tedge-mapper[78]: 2023-11-04T21:44:20.324596521Z  INFO c8y_api::http_proxy: JWT token requested
Nov 04 21:44:40 17a2666b257d tedge-mapper[78]: 2023-11-04T21:44:40.329401572Z  INFO c8y_api::http_proxy: JWT token requested
Nov 04 21:45:00 17a2666b257d tedge-mapper[78]: 2023-11-04T21:45:00.342133665Z ERROR c8y_api::http_proxy: Fail to retrieve JWT token after 3 attempts
Nov 04 21:45:00 17a2666b257d tedge-mapper[78]: 2023-11-04T21:45:00.343687957Z  INFO mqtt_channel::connection: MQTT connection closed
Nov 04 21:45:00 17a2666b257d tedge-mapper[78]: 2023-11-04T21:45:00.343733124Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 20 seconds
Nov 04 21:45:00 17a2666b257d tedge-mapper[78]:  Error: CustomError("JWT token not available")

mosquitto

The mosquitto logs from the same time period when the agent is in the initialization loop waiting for a token.

Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: New connection from 127.0.0.1:51158 on port 1883.
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: New client connected from 127.0.0.1:51158 as vqhromalin (p2, c1, k60).
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: No will message specified.
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: Sending CONNACK to vqhromalin (0, 0)
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: Received SUBSCRIBE from vqhromalin
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240:         c8y/s/dat (QoS 1)
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: vqhromalin 1 c8y/s/dat
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: Sending SUBACK to vqhromalin
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: Received PUBLISH from vqhromalin (d0, q1, r0, m2, 'c8y/s/uat', ... (0 bytes))
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: Sending PUBLISH to auto-CD3C78F7-90B7-44C4-72A6-99E816F2C481 (d0, q0, r0, m0, 'c8y/s/uat', ... (0 bytes))
Nov 04 21:44:00 17a2666b257d mosquitto[67]: 1699134240: Sending PUBACK to vqhromalin (m2, rc0)
Nov 04 21:44:18 17a2666b257d mosquitto[67]: 1699134258: Sending PINGREQ to Cumulocity
Nov 04 21:44:18 17a2666b257d mosquitto[67]: 1699134258: Received PINGRESP from Cumulocity
Nov 04 21:44:18 17a2666b257d mosquitto[67]: 1699134258: Received PINGREQ from auto-CD3C78F7-90B7-44C4-72A6-99E816F2C481
Nov 04 21:44:18 17a2666b257d mosquitto[67]: 1699134258: Sending PINGRESP to auto-CD3C78F7-90B7-44C4-72A6-99E816F2C481
Nov 04 21:44:19 17a2666b257d mosquitto[67]: 1699134259: Received PINGREQ from last_will_c8y_mapper
Nov 04 21:44:19 17a2666b257d mosquitto[67]: 1699134259: Sending PINGRESP to last_will_c8y_mapper
Nov 04 21:44:19 17a2666b257d mosquitto[67]: 1699134259: Received PINGREQ from tedge-mapper-c8y
Nov 04 21:44:19 17a2666b257d mosquitto[67]: 1699134259: Sending PINGRESP to tedge-mapper-c8y
Nov 04 21:44:19 17a2666b257d mosquitto[67]: 1699134259: Received PINGREQ from tedge-sub-101
Nov 04 21:44:19 17a2666b257d mosquitto[67]: 1699134259: Sending PINGRESP to tedge-sub-101
Nov 04 21:44:20 17a2666b257d mosquitto[67]: 1699134260: Received PUBLISH from vqhromalin (d0, q1, r0, m3, 'c8y/s/uat', ... (0 bytes))
Nov 04 21:44:20 17a2666b257d mosquitto[67]: 1699134260: Sending PUBLISH to auto-CD3C78F7-90B7-44C4-72A6-99E816F2C481 (d0, q0, r0, m0, 'c8y/s/uat', ... (0 bytes))
Nov 04 21:44:20 17a2666b257d mosquitto[67]: 1699134260: Sending PUBACK to vqhromalin (m3, rc0)
Nov 04 21:44:40 17a2666b257d mosquitto[67]: 1699134280: Received PUBLISH from vqhromalin (d0, q1, r0, m4, 'c8y/s/uat', ... (0 bytes))
Nov 04 21:44:40 17a2666b257d mosquitto[67]: 1699134280: Sending PUBLISH to auto-CD3C78F7-90B7-44C4-72A6-99E816F2C481 (d0, q0, r0, m0, 'c8y/s/uat', ... (0 bytes))
Nov 04 21:44:40 17a2666b257d mosquitto[67]: 1699134280: Sending PUBACK to vqhromalin (m4, rc0)
Nov 04 21:45:00 17a2666b257d mosquitto[67]: 1699134300: Received PINGREQ from vqhromalin
Nov 04 21:45:00 17a2666b257d mosquitto[67]: 1699134300: Sending PINGRESP to vqhromalin
Nov 04 21:45:00 17a2666b257d mosquitto[67]: 1699134300: Received DISCONNECT from vqhromalin
Nov 04 21:45:00 17a2666b257d mosquitto[67]: 1699134300: Client vqhromalin disconnected.

Environment (please complete the following information):

Property Value
OS [incl. version] Debian GNU/Linux 11 (bullseye)
Hardware [incl. revision] unknown
System-Architecture Linux 17a2666b257d 5.15.68-0-virt #1-Alpine SMP Fri, 16 Sep 2022 06:29:31 +0000 aarch64 GNU/Linux
thin-edge.io version tedge 0.13.0
mosquitto version 2.0.11

Additional context

reubenmiller commented 11 months ago

QA has thoroughly checked the feature and here are the results:

Detailed results: The test script provided in the issue was run in the new version and no errors detected:

sudo ./debug-qos-issue.sh
Max number of runs reached (disabling service)
20 passed, 0 failed, 20 total

And no errors detected in the tedge-mapper-c8y logs either:

$ journalctl -u tedge-mapper-c8y -n 10000 | grep JWT
Nov 11 09:21:59 bbf0c2a0fe36 tedge-mapper[783]: 2023-11-11T09:21:59.939622358Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:22:00 bbf0c2a0fe36 tedge-mapper[783]: 2023-11-11T09:22:00.322978399Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:22:08 bbf0c2a0fe36 tedge-mapper[1031]: 2023-11-11T09:22:08.012651861Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:22:08 bbf0c2a0fe36 tedge-mapper[1031]: 2023-11-11T09:22:08.03971257Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:24:36 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:24:36.416902099Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:24:36 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:24:36.787357474Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:24:54 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:24:54.205046191Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:24:54 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:24:54.659474524Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:25:11 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:25:11.849651657Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:25:12 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:25:12.765703741Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:25:29 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:25:29.547639708Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:25:30 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:25:30.153399249Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:25:47 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:25:47.356950258Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:25:47 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:25:47.489894216Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:26:05 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:26:05.280921683Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:26:06 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:26:06.356814725Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:26:22 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:26:22.76605565Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:26:23 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:26:23.177114358Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:26:40 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:26:40.448079741Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:26:40 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:26:40.978993283Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:28:28 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:28:28.528151876Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:28:29 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:28:29.129787001Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:28:46 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:28:46.191898593Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:28:47 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:28:47.013378385Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:29:04 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:29:04.114964477Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:29:04 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:29:04.660293602Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:29:22 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:29:22.229899568Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:29:23 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:29:23.120602486Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:29:39 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:29:39.886122077Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:29:40 bbf0c2a0fe36 tedge-mapper[62]: 2023-11-11T09:29:40.44406816Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:29:57 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:29:57.781517835Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:29:58 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:29:58.485614586Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:30:15 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:30:15.812048177Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:30:16 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:30:16.544919053Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:30:33 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:30:33.660222019Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:30:34 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:30:34.115250686Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:30:51 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:30:51.370409194Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:30:51 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:30:51.99424982Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:31:09 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:31:09.039822453Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:31:09 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:31:09.65310262Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:31:26 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:31:26.902802545Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:31:27 bbf0c2a0fe36 tedge-mapper[61]: 2023-11-11T09:31:27.40605692Z  INFO c8y_api::http_proxy: JWT token received
Nov 11 09:31:44 bbf0c2a0fe36 tedge-mapper[78]: 2023-11-11T09:31:44.641433428Z  INFO c8y_api::http_proxy: JWT token requested
Nov 11 09:31:45 bbf0c2a0fe36 tedge-mapper[78]: 2023-11-11T09:31:45.33213272Z  INFO c8y_api::http_proxy: JWT token received