thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
222 stars 55 forks source link

tedge-mapper-c8y init loop blocked by JWT token request that is never fulfilled #1972

Closed reubenmiller closed 1 year ago

reubenmiller commented 1 year ago

Describe the bug

Sometimes on device startup, the tedge-mapper-c8y is stuck in the initialization loop requesting a JWT token. This blocks some of the communication going out from the device to Cumulocity, as the mapper requires a JWT to retreive information such as the internal device id of digital twin in Cumulocity.

To Reproduce

Note: This bug is very difficult to reproduce because it does not occur every time!

The following instructions use the tedge-demo-container to setup thin-edge.io.

  1. Download the docker compose file

    curl -LSs https://raw.githubusercontent.com/thin-edge/tedge-demo-container/main/demos/docker-compose/device/docker-compose.yaml > docker-compose.yaml
  2. Start the demo container

    docker compose up -d
  3. Bootstrap the device (it will prompt for input)

    docker compose exec tedge bootstrap.sh
  4. Open the UI to the device after it has been successfully bootstrapped

  5. Send a "Restart Device" operation via the Cumulocity UI to the device

  6. After the container has restarted, observer the journald logs for the tedge-mapper-c8y process

    docker compose exec tedge journalctl -fu tedge-mapper-c8y --boot -n 100

Once in this never ending retry loop, restarting the tedge-mapper-c8y multiple times (~5 times) has not affect. However restarting mosquitto then the tedge-mapper-c8y then worked.

Expected behavior

The JWT connection should be reliable to ensure that the tedge-mapper-c8y service is able to function reliable communication with Cumulocity.

Screenshots

Checking the log of the mapper showed that it was in a continous loop to request the JWT from Cumulocity (via the MQTT bridge).

The log snippet was obtained using the following command:

journalctl -fu tedge-mapper-c8y -n 100

Log output

May 15 19:05:27 7866d7609994 tedge-mapper[85]: 2023-05-15T19:05:27.840851311Z  INFO C8Y-REST => JWT: send ()
May 15 19:05:27 7866d7609994 tedge-mapper[85]: 2023-05-15T19:05:27.840977227Z  INFO C8YJwtRetriever: recv Some((0, ()))
May 15 19:05:37 7866d7609994 tedge-mapper[85]: 2023-05-15T19:05:37.844703565Z  INFO C8YJwtRetriever: send (0, Err(NoJwtReceived))
May 15 19:05:37 7866d7609994 tedge-mapper[85]: 2023-05-15T19:05:37.844843315Z  INFO C8Y-REST => JWT: recv Some(Err(NoJwtReceived))
May 15 19:05:37 7866d7609994 tedge-mapper[85]: 2023-05-15T19:05:37.844865274Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 60 seconds
May 15 19:05:37 7866d7609994 tedge-mapper[85]:  Error: CustomError("JWT token not available")
May 15 19:06:37 7866d7609994 tedge-mapper[85]: 2023-05-15T19:06:37.846357677Z  INFO C8Y-REST => JWT: send ()
May 15 19:06:37 7866d7609994 tedge-mapper[85]: 2023-05-15T19:06:37.846452136Z  INFO C8YJwtRetriever: recv Some((0, ()))
May 15 19:06:47 7866d7609994 tedge-mapper[85]: 2023-05-15T19:06:47.84999789Z  INFO C8YJwtRetriever: send (0, Err(NoJwtReceived))
May 15 19:06:47 7866d7609994 tedge-mapper[85]: 2023-05-15T19:06:47.850156015Z  INFO C8Y-REST => JWT: recv Some(Err(NoJwtReceived))
May 15 19:06:47 7866d7609994 tedge-mapper[85]: 2023-05-15T19:06:47.85018314Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 60 seconds
May 15 19:06:47 7866d7609994 tedge-mapper[85]:  Error: CustomError("JWT token not available")
May 15 19:07:47 7866d7609994 tedge-mapper[85]: 2023-05-15T19:07:47.851883086Z  INFO C8Y-REST => JWT: send ()
May 15 19:07:47 7866d7609994 tedge-mapper[85]: 2023-05-15T19:07:47.852106419Z  INFO C8YJwtRetriever: recv Some((0, ()))
May 15 19:07:57 7866d7609994 tedge-mapper[85]: 2023-05-15T19:07:57.856532465Z  INFO C8YJwtRetriever: send (0, Err(NoJwtReceived))
May 15 19:07:57 7866d7609994 tedge-mapper[85]: 2023-05-15T19:07:57.856588965Z  INFO C8Y-REST => JWT: recv Some(Err(NoJwtReceived))
May 15 19:07:57 7866d7609994 tedge-mapper[85]: 2023-05-15T19:07:57.856612215Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 60 seconds
May 15 19:07:57 7866d7609994 tedge-mapper[85]:  Error: CustomError("JWT token not available")

However I could still manually request a Cumulocity token via publishing to the MQTT topic c8y/s/uat using the following commands:

tedge mqtt pub 'c8y/s/uat' ''

The output of another console running tedge mqtt sub '#' shows that the token was being returned successfully.

[c8y/s/uat]
[c8y/s/dat] 71,{token_value}

Environment (please complete the following information):

Additional context

Additional logs for tedge-mapper-c8y are available here:

tedge-mapper-c8y.log

The mosquitto logs seem to show that there are clients sending the c8y/s/dat requests.

journalctl -f -u mosquitto -n 100
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Client tedge-pub-17726 closed its connection.
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: New connection from 127.0.0.1:43592 on port 1883.
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: New client connected from 127.0.0.1:43592 as qwtayntcqk (p2, c1, k60).
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: No will message specified.
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending CONNACK to qwtayntcqk (0, 0)
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: New connection from 127.0.0.1:43598 on port 1883.
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Received SUBSCRIBE from qwtayntcqk
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051:         c8y/s/dat (QoS 1)
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: qwtayntcqk 1 c8y/s/dat
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending SUBACK to qwtayntcqk
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: New client connected from 127.0.0.1:43598 as hbymzdlnqv (p2, c1, k60).
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: No will message specified.
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending CONNACK to hbymzdlnqv (0, 0)
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Received SUBSCRIBE from hbymzdlnqv
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051:         c8y/s/dat (QoS 1)
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: hbymzdlnqv 1 c8y/s/dat
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending SUBACK to hbymzdlnqv
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: New connection from 127.0.0.1:43614 on port 1883.
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Received PUBLISH from qwtayntcqk (d0, q1, r0, m2, 'c8y/s/uat', ... (0 bytes))
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending PUBLISH to auto-2192673A-03E9-00EB-FB37-9479195710D8 (d0, q0, r0, m0, 'c8y/s/uat', ... (0 bytes))
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending PUBACK to qwtayntcqk (m2, rc0)
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Received PUBLISH from hbymzdlnqv (d0, q1, r0, m2, 'c8y/s/uat', ... (0 bytes))
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending PUBLISH to auto-2192673A-03E9-00EB-FB37-9479195710D8 (d0, q0, r0, m0, 'c8y/s/uat', ... (0 bytes))
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending PUBACK to hbymzdlnqv (m2, rc0)
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: New client connected from 127.0.0.1:43614 as audrvuszvk (p2, c1, k60).
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: No will message specified.
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending CONNACK to audrvuszvk (0, 0)
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Received SUBSCRIBE from audrvuszvk
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051:         c8y/s/dat (QoS 1)
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: audrvuszvk 1 c8y/s/dat
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending SUBACK to audrvuszvk
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Received PUBLISH from audrvuszvk (d0, q1, r0, m2, 'c8y/s/uat', ... (0 bytes))
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending PUBLISH to auto-2192673A-03E9-00EB-FB37-9479195710D8 (d0, q0, r0, m0, 'c8y/s/uat', ... (0 bytes))
May 15 20:20:51 9b469091f733 mosquitto[66]: 1684182051: Sending PUBACK to audrvuszvk (m2, rc0)
reubenmiller commented 1 year ago

I just checked the other plugins and they has the following state and logs:

plugin JWT Successfully received
c8y-log-plugin No
c8y-configuration-plugin No
c8y-firmware-plugin Yes

Logs

# journalctl -fu c8y-log-plugin
-- Journal begins at Mon 2023-05-15 19:40:47 UTC. --
May 15 20:12:51 9b469091f733 c8y-log-plugin[77]: 2023-05-15T20:12:51.755599156Z  INFO C8YJwtRetriever: send (0, Err(NoJwtReceived))
May 15 20:12:51 9b469091f733 c8y-log-plugin[77]: 2023-05-15T20:12:51.755842156Z  INFO C8Y-REST => JWT: recv Some(Err(NoJwtReceived))
May 15 20:12:51 9b469091f733 c8y-log-plugin[77]: 2023-05-15T20:12:51.756224364Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 60 seconds
May 15 20:12:51 9b469091f733 c8y-log-plugin[77]:  Error: CustomError("JWT token not available")
May 15 20:13:51 9b469091f733 c8y-log-plugin[77]: 2023-05-15T20:13:51.759196976Z  INFO C8Y-REST => JWT: send ()
May 15 20:13:51 9b469091f733 c8y-log-plugin[77]: 2023-05-15T20:13:51.759262559Z  INFO C8YJwtRetriever: recv Some((0, ()))
May 15 20:14:01 9b469091f733 c8y-log-plugin[77]: 2023-05-15T20:14:01.763226689Z  INFO C8YJwtRetriever: send (0, Err(NoJwtReceived))
May 15 20:14:01 9b469091f733 c8y-log-plugin[77]: 2023-05-15T20:14:01.763307814Z  INFO C8Y-REST => JWT: recv Some(Err(NoJwtReceived))
May 15 20:14:01 9b469091f733 c8y-log-plugin[77]: 2023-05-15T20:14:01.763322106Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 60 seconds
May 15 20:14:01 9b469091f733 c8y-log-plugin[77]:  Error: CustomError("JWT token not available")

root@9b469091f733:/setup# journalctl -fu c8y-configuration-plugin
-- Journal begins at Mon 2023-05-15 19:40:47 UTC. --
May 15 20:12:51 9b469091f733 c8y-configuration-plugin[71]: 2023-05-15T20:12:51.754704031Z  INFO C8YJwtRetriever: send (0, Err(NoJwtReceived))
May 15 20:12:51 9b469091f733 c8y-configuration-plugin[71]: 2023-05-15T20:12:51.754985656Z  INFO C8Y-REST => JWT: recv Some(Err(NoJwtReceived))
May 15 20:12:51 9b469091f733 c8y-configuration-plugin[71]: 2023-05-15T20:12:51.755052406Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 60 seconds
May 15 20:12:51 9b469091f733 c8y-configuration-plugin[71]:  Error: CustomError("JWT token not available")
May 15 20:13:51 9b469091f733 c8y-configuration-plugin[71]: 2023-05-15T20:13:51.759324392Z  INFO C8Y-REST => JWT: send ()
May 15 20:13:51 9b469091f733 c8y-configuration-plugin[71]: 2023-05-15T20:13:51.759404309Z  INFO C8YJwtRetriever: recv Some((0, ()))
May 15 20:14:01 9b469091f733 c8y-configuration-plugin[71]: 2023-05-15T20:14:01.763276814Z  INFO C8YJwtRetriever: send (0, Err(NoJwtReceived))
May 15 20:14:01 9b469091f733 c8y-configuration-plugin[71]: 2023-05-15T20:14:01.763337772Z  INFO C8Y-REST => JWT: recv Some(Err(NoJwtReceived))
May 15 20:14:01 9b469091f733 c8y-configuration-plugin[71]: 2023-05-15T20:14:01.763350731Z ERROR c8y_http_proxy::actor: An error occurred while retrieving internal Id, operation will retry in 60 seconds
May 15 20:14:01 9b469091f733 c8y-configuration-plugin[71]:  Error: CustomError("JWT token not available")

root@9b469091f733:/setup# journalctl -fu c8y-firmware-plugin
-- Journal begins at Mon 2023-05-15 19:40:47 UTC. --
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.510676374Z  INFO Runtime: Running MQTT-1
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.511735083Z  INFO Runtime: Running C8YJwtRetriever-2
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.511741999Z  INFO Runtime: Running Downloader-3
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.511747874Z  INFO Runtime: Running FirmwareManager-4
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.511752166Z  INFO Runtime: Running Timer-5
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.511761833Z  INFO Runtime: Running HealthMonitorActor-6
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.511780624Z  INFO c8y-firmware-plugin: send Message { topic: Topic { name: "tedge/health/c8y-firmware-plugin" }, payload: "{"pid":72,"status":"up","time":1684179671}", qos: AtLeastOnce, retain: true }
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.512025333Z  INFO c8y_firmware_manager::actor: Ready to serve firmware requests.
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.534127916Z  INFO MQTT: recv Ok(Some(Message { topic: Topic { name: "tedge/health/c8y-firmware-plugin" }, payload: "{"pid":72,"status":"up","time":1684179671}", qos: AtLeastOnce, retain: true }))
May 15 19:41:11 9b469091f733 c8y-firmware-plugin[72]: 2023-05-15T19:41:11.534150999Z  INFO MQTT: recv Ok(Some(Message { topic: Topic { name: "c8y/s/us" }, payload: "500", qos: AtLeastOnce, retain: false }))
reubenmiller commented 1 year ago

After more debugging there is are some indications that the empty message is not being published to c8y/s/uat.

I was able to manually publish a message via the command line to simulate the message and it was successful, though for it to work I did have to wait to send it at the correct time (when the service under test was actually subscribed to the c8y/s/dat topic)

I used the following command to simulate an empty string.

mosquitto_pub -t c8y/s/uat -m ''
reubenmiller commented 1 year ago

Some logging and retry logic regarding republishing to the c8y/s/uat topic has been added. Closing for now. If it is viewed in the wild then it can be reopened.