Closed tve closed 2 years ago
Thank you for your comments. I will review this module in due course.
The new uasyncio is due for imminent release. I expect to be busy with that for some time, given that almost all my published code makes heavy use of it. My tutorial is also likely to need a substantial update.
Sorry for the ridiculous delay in my addressing this. I may be missing something, but I'm unconvinced by this proposed change.
[Heavily edited]
I envisage three situations in which timeouts will occur.
For a race to occur, the communication issue causing the the broker to miss pings must repair itself at about the keepalive time. If the fault persists for longer, the behaviour of client and broker seem correct: broker sends LW, client keeps trying to reconnect. On the other hand if the outage clears quickly, communication is re-established transparently.
Assume this worst-case duration, and that all communication was lost. At some point in time the client successfully reconnects. From the broker's point of view, an outage has occurred. If it "thinks" the outage is too long, it sends the LW, otherwise it does not. If the client manages to reconnect quickly enough, the LW may not be sent. There is indeed an uncertainty where a physical outage lasts roughly as long as the keepalive value. Crucially it seems inevitable regardless of client timings.
With your proposed change an outage of this exact duration would mean that, at the time when the outage cleared, the client would already be making reconnect attempts when the broker was deciding it was time to send the LW. There is still a race because reconnection is an iterative process. I think marginal timing is inevitable when outage duration ~= keepalive time.
One way to enforce deterministic behaviour is as follows. Client still sends four pings in the KA period. If no response has occurred, either the pings or the acks have been lost. Client now does not "know" if the broker has detected the problem, so it forces an outage. This must last longer than 1.5x the keepalive time before it can be sure that the broker has detected the outage, sent the LW and closed the connection. The client then starts a new connection attempt.
This is decidedly slow. I'm not convinced that the overall issue is a practical problem.
Another nit ;-)
Unless the user explicitly sets
config['ping_interval']
the timeout where the_keep_alive
starts a reconnection process is the same as thekeep alive
value set at the MQTT protocol level. This produces a bit of a race condition as to whom times out first.Unless I'm missing something, the two effects of the keep alive mechanism at the MQTT level are for the broker to send the last will message when it discovers that the keep live timed out, and for the broker to close the connection when 1.5x the keep-alive times out. Nothing else.
So when the
_keep_alive
coro discovers a time-out there's a bit of a race condition between the client reconnecting and the broker discovering the time-out and triggering the last will message. If no last will message is set, the MQTT keep alive mechanism is really pretty irrelevant given the client-side timeout. IMHO for proper last will operation the coro's time-out should be shorter than the MQTT keep-alive time, which can be set explicitly using theping_interval
config param but that's not exactly obvious to someone that just naively sets the last will message.