thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
221 stars 54 forks source link

tedge-agent fails to process any pending operation when there are too many. #2639

Closed didier-wenzek closed 8 months ago

didier-wenzek commented 9 months ago

tedge-agent fails to process any pending operation when there are too many.

  1. Stop the agent
  2. Stop and restart the mapper several times (I did it 12 times)
  3. Observe that each restart of the mapper created a te/device/main///cmd/software_list/c8y-mapper-2024-XXX request
  4. Start the agent
  5. Observe that the agent is not processing the pending software list requests
  6. Send a health check request to all the daemons with tedge mqtt pub te/device/main///cmd/health/check ''
  7. Observe that the mapper is responding, but not the agent.

To unblock the situation, all the pending requests must be cleared and the agent restarted This can be done as follows:

$ mosquitto_sub -F "%t" -t "te/device/main///cmd/software_list/+" | xargs -n 1 mosquitto_pub -r -m '' -t 
$ sudo systemctl restart tedge-agent

This issue is due to a cycle of actors trying to send messages to each other while their message boxes are full. The software_manager and tedge_operator_converter are sending messages to each other. When the tedge_operator_converter message box is full of pending requests received from MQTT, the software_manager can no more send operation status back to the tedge_operator_converter and is therefore stuck.

gligorisaev commented 8 months ago

QA has thoroughly checked the bug and here are the results: