Closed reubenmiller closed 10 months ago
I have suspect that something that some interaction with the request for the Cumulocity token is causing the high file descriptor counts / exhaustion. As the high counts is mostly seen on thin-edge.io components which require a jwt from Cumulocity.
It would be interesting to find out the following (though this is just one theory...you should not rule out other potential sources)
The large number of file descriptors has not been reproducible yet. It seems that the specific scenario can only be invoked by replicating this behaviour from the server side (which is technically not feasible at this time). We're trying to setup some long term testing farms to monitor such long term behaviour to provide more clues in the future.
Though we are confident that under normal operation there is no leakage of file descriptors, and it is limited to the case where the cloud MQTT broker rejects the client connection after 1-2 seconds of successfully connecting. So there should be low risk of this happening.
Closing as this has not been able to reproduce, though in generally use the following versions:
Describe the bug
Potential file descriptor leak when the mosquitto bridge to Cumulocity IoT is connection then being rejected by the server (with "Not Authorized").
However the mosquitto bridge to Cumulocity IoT would actually get connected for a brief moment (as indicated by the health topic), however it would be disconnected seconds later.
The root cause for the bridge connect/disconnect is due to Cumulocity IoT having some issues after an upgrade when the clean session is set to False.
To Reproduce
There has been no easy way to reproduce this scenario as the situation occurred due to a Cumulocity IoT platform upgrade and the fact the mosquitto bridge was not using a clean session True. This would result in the bridge being briefly connected (and the `)
Expected behavior
In error states where the mosquitto bridge connection is not stable, then all of the thin-edge.io components should not cause the file descriptor limit (max open files) to be reached, as this affects the recovery of the device. For example, if the old sessions were manually deleted from the Cumulocity IoT server side, the mosquitto bridge connection would have been able to re-connect, however it was not able to as the max file descriptor limit had been reached.
Screenshots
The file descriptor counts per process were recorded on a device where mosquitto stop functionality as it had exceeded the max open files limit (as controlled by systemd).
The fd counts and the cmdline per process were then merged to show the unusually high file descriptor counts for mosquitto and some of the thin-edge.io components.
The
ulimit -a
shows that the limit is set to 1024 files:Environment (please complete the following information):
Additional context
The following commands were used to count the open file descriptors per process, and then display the command behind each process (e.g. the /proc/N/cmdline file).