sbtinstruments / aiomqtt

The idiomatic asyncio MQTT client, wrapped around paho-mqtt
https://sbtinstruments.github.io/aiomqtt
BSD 3-Clause "New" or "Revised" License
392 stars 71 forks source link

MQTT broker reports a disconnection/reconnection... but no aiomqtt.MqttError is raised #283

Closed f18m closed 3 months ago

f18m commented 4 months ago

Hi, [first of all thanks for this great library! I love it!]

I created a small project using aiomqtt. The software is meant to run 24/7 on a Raspberry PI. The MQTT broker is a Mosquitto broker (version 2.0.11) running on the same Raspberry PI. I'm using aiomqtt version 2.0.0

My issue is that the software runs fine for several hours. Then typically between 2 and 4 times / day, I see some disconnection reported by the mosquitto broker but nothing is "signalled" on my application. For example today at 2:54am I found in mosquitto.log:

2024-03-05T02:54:31: Outgoing messages are being dropped for client 6oWlj6DunsNKB3a3cjBthG.
2024-03-05T02:54:32: Client 6oWlj6DunsNKB3a3cjBthG closed its connection.
2024-03-05T02:54:33: New connection from 192.168.1.5:46764 on port 1883.
2024-03-05T02:54:33: New client connected from 192.168.1.5:46764 as 6oWlj6DunsNKB3a3cjBthG (p2, c1, k60).

(Note that the IP of the raspberry where both MQTT and my app run is 192.168.1.5.. so I'm pretty sure it's my app based on aiomqtt the one that has "closed its connection"... my app creates 4 instances of aiomqtt.Client so I just don't know which one of those instances is the "guilty one")

On my Python software every time aiomqtt.MqttError is catched I print a message... searching for that message reveals nothing in my application. My theory is that aiomqtt.Client has somehow reconnected (in about 2sec) without reporting anything to application code... is that possible?

Btw my application is open-source and available at https://github.com/f18m/ha-alarm-raspy2mqtt/blob/main/raspy2mqtt/raspy2mqtt.py. The main code can be summarized as:

        try:
            # Use a task group to manage and await all (endless) tasks
            async with asyncio.TaskGroup() as tg:
                tg.create_task(...task1 that is going to create an aiomqtt.Client and publish...)
                tg.create_task(...task2 that is going to create an aiomqtt.Client and publish...)
                ...

        except* aiomqtt.MqttError as err:
            print(f"Connection lost: {err.exceptions}; reconnecting in {reconnection_interval_sec} seconds ...")
            g_stats['num_connections_lost'] += 1
            await asyncio.sleep(reconnection_interval_sec)

I cannot find the print "connection lost" neither I see the g_stats['num_connections_lost'] being incremented... what am I missing?

empicano commented 4 months ago

Hi Francesco,

That's nice to hear that you like aiomqtt 😊

Let's try to solve this together! You say that you have four aiomqtt.Client instances, but don't know which one is the guilty one. Your mosquitto logs include the client identifier "6oWlj6DunsNKB3a3cjBthG", which I assume was automatically generated. You can set custom identifiers for each client instance by setting the client's identifier parameter on initialization. This way we know which one failed 👍

Apart from that, the asyncio documentation on TaskGroups states that "The first time any of the tasks belonging to the group fails with an exception other than asyncio.CancelledError, the remaining tasks in the group are cancelled". Going by your pseudo code, I would assume that when one client errors, all others should disconnect as well. As that doesn't seem to be the case, the exception is probably handled somewhere inside the task. By default, aiomqtt doesn't reconnect. You can however configure reconnection as described in the documentation.

Let me know if that helps 🙂

f18m commented 4 months ago

hi @empicano , thanks for your answer!

You can set custom identifiers for each client instance by setting the client's identifier parameter on initialization. This way we know which one failed 👍

Nice, I will use it!

[...] Going by your pseudo code, I would assume that when one client errors, all others should disconnect as well. Right... that was my design criteria... the idea is that a client will get an error presumably in case of some network issue, and it's likely the network issue will impact all clients at the same time, since they run from the same application, same compute node.

So that's why I catch the aiomqtt.MqttError outside all tasks.

As that doesn't seem to be the case, the exception is probably handled somewhere inside the task.

The thing is: in my (short) software there 1 single "except aiomqtt.MqttError"... nor there is any "catch all" except statement. So if I see a log of a (unexpected) client disconnection in the mosquitto logs, I was expecting that my "except aiomqtt.MqttError" code block was executed... I still cannot figure out why it's not.

By default, aiomqtt doesn't reconnect. You can however configure reconnection as described in the documentation.

Ok good to know. Indeed my code was written using that reconnection page as reference...

thanks I will update if/when this problem happens again. Today I had to stop and restart my application several times so I cannot say from mosquitto logs if there is any unexpected disconnection.

f18m commented 3 months ago

Hi @empicano , hi all,

Thanks to the suggestion of using a custom-defined identifier, I've been able to understand that when I see in mosquitto broker a line like that:

2024-03-05T02:54:33: New connection from 192.168.1.5:46764 on port 1883.
2024-03-05T02:54:33: New client connected from 192.168.1.5:46764 as 6oWlj6DunsNKB3a3cjBthG (p2, c1, k60).

it does NOT mean that the client IP address is 192.168.1.5. 192.168.1.5 is the IP address of the Mosquitto broker. Actually I discovered these client connections do not come from my own software (which now uses human-friendly identifiers for clients) but rather from Home Assistant. So the problem is not in my application written using aiomqtt.

I'm closing this ticket then. Thanks for your support!!

empicano commented 3 months ago

Great detective work, and thanks for reporting back! 🚀