Open reubenmiller opened 4 months ago
The current theory is that the open file limit was exhausted, and caused mosquitto to become unresponsive, however it has been difficult to reproduce this scenario exactly.
Here are some mosquitto tickets:
There is also a theory that the high CPU behaviour might be due to an older version of mosquitto (e.g. < 2.0.18).
On Debian bookworm, mosquitto 2.0.18 (or newer) can be installed via the bookworm-backports
repo (see instructions below):
Edit the apt sources list
/etc/apt/sources.list
# Backports
deb http://deb.debian.org/debian bookworm-backports main contrib non-free non-free-firmware
deb-src http://deb.debian.org/debian bookworm-backports main contrib non-free non-free-firmware
Update mosquitto to the latest available version
apt-get update
apt-get install mosquitto -t bookworm-backports
The memory used by the thin-edge.io components has levelled out (but the memory has not been released) after restarting the mosquitto service.
Describe the bug
On a device, it was observed that the local mosquitto MQTT broker was unresponsive and the mosquitto process as consuming 100% of the CPU processing time of 1 core.
mosquitto was reporting the following error, and after some time no more logs were being written:
The local MQTT broker was non-functional which resulted in all of the thin-edge.io components failing to connect to the broker with the following network connection error:
Manually subscribing to the local MQTT broker on
localhost:1883
was also met with a Network timeout error.mosquitto was able to be revived only by restarting the service, using:
Afterwards all of the services started functioning again.
Secondary symptoms
The following were some secondary symptoms which were observed when the device was in this state.
To Reproduce
This situation has not been able to be reproduced yet, however there seems to be some correlation between the Cumulocity IoT update occurring and this mosquitto high CPU behaviour.
Expected behavior
Screenshots
Debian GNU/Linux 12 (bookworm)
Raspberry Pi 4 Model B Rev 1.5
Linux rackfslot1 6.1.0-rpi6-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.58-1+rpt2 (2023-10-27) aarch64 GNU/Linux
tedge 0.13.2~141+g1ef77c9
2.0.11
Additional context
log files
The following log files were collected for two devices, one where mosquitto was using 100% CPU and the other was device resumed the c8y-bridge connection after the Cumulocity IoT update.
Mitigation strategy
A mitigation strategy would be to use a service like monit to detect the situation where the CPU usage spikes for the mosquitto broker and restart it if it has sustained high CPU load.