robhogan / react-native-paho-mqtt

react-native-paho-mqtt
91 stars 32 forks source link

Connection lost but client returns that it is still connected #12

Open sebirdman opened 6 years ago

sebirdman commented 6 years ago

After putting my application in the background for a minute (on android) and resuming i notice that the Client.isConnected() call returns true, even when the trace ping logs are not showing responses. (and my other clients do not receive anything sent from the disconnected client)

This problem seems to be tied to this: https://github.com/rh389/react-native-paho-mqtt/commit/8185b64928c5d0f87111b1e99dfbc423163d270d

1) _socketSend (the only place our pinger seems to be reset) is never called (as far as i can see) 2) our pinger will never "fail" which i think breaks the "keep alive" function of mqtt outlined here: https://www.hivemq.com/blog/mqtt-essentials-part-10-alive-client-take-over 3) if i set the keep alive time to a shorter time (default for this library is 60 seconds - amazon iot says the shortest they allow is 5 seconds) then this client will be disconnected after that time without properly reporting that it is disconnected.

sebirdman commented 6 years ago

I've created a PR that starts to fix this problem. After a ping request fails to receive a response, it'll disconnect. The only issue is that this is n keepAlive seconds after the application has been resumed, ideally we would send some kind of ping request out on resume (or just consider ourselves disconnected if we know we've missed a ping request, since this is what the mqtt spec seems to say will happen)

robhogan commented 6 years ago

Thanks for looking into this. IIRC, I removed this functionality originally because it was a bit off-spec (using a second timer to ping only when we hadn't heard from the server) and conflated keepalive with response timeout (as you point out). Normally you'd expect a ping response within a second or two, whereas the keepalive interval is often several minutes, so using the same value for both doesn't sit right.

Technically the current implementation is spec compliant, since the spec only requires the server to issue a disconnection if it hasn't heard from the client - it doesn't place any firm obligation on the client (it only makes a fairly vague recommendation):

If the Keep Alive value is non-zero and the Server does not receive a Control Packet from the Client within one and a half times the Keep Alive time period, it MUST disconnect the Network Connection to the Client as if the network had failed [MQTT-3.1.2-24].

If a Client does not receive a PINGRESP Packet within a reasonable amount of time after it has sent a PINGREQ, it SHOULD close the Network Connection to the Server.`

_socketSend (the only place our pinger seems to be reset) is never called (as far as i can see)

Good catch - that function is redundant and should be removed. The pinger is reset elsewhere though - in process_queue.

if i set the keep alive time to a shorter time (default for this library is 60 seconds - amazon iot says the shortest they allow is 5 seconds) then this client will be disconnected after that time without properly reporting that it is disconnected.

I'm not sure what you mean by this. The keepalive interval is specified by the client in the connect request, and the server should respect that, so there shouldn't be a mismatch of expectations.

sebirdman commented 6 years ago

It does seem off spec to have the second timer pinging only when we haven't heard from the server. Not sure what the proper solution to this really should be then (as you point out in the merge request, there might be cases where the pinger being reset causes a disconnect from the server)

The pinger is reset elsewhere though - in process_queue.

ahhh there it is!

I'm not sure what you mean by this. The keepalive interval is specified by the client in the connect request, and the server should respect that, so there shouldn't be a mismatch of expectations.

What I mean't by this is that it seems that the server is disconnecting us after the keepalive time, as it should, but that the pinger wasn't alerting us to the disconnect after the keepalive time is up.

Maybe a good solution would be to hold onto the last time the pinger ran, check to see before we ping if it's > keepalive time and then assume that the server has issued a disconnect?

eg:

lastPingTime = 1513972970
currentTime = 1513977970

if (currentTime - lastAliveTime > keepAliveTime) disconnect()
robhogan commented 6 years ago

Incidentally, are you actually seeing unnoticed disconnections in practice? I would've thought the underlying websocket layer would pick up disconnections regardless of what we're doing, and that should trigger a call of _disconnected (assuming the socket isn't being held open by a reverse proxy after the MQTT server has died, I guess)

sebirdman commented 6 years ago

I'm seeing unnoticed disconnections when the application goes into the background for any time > keepAlive on Android 8.0. (Might be other android versions as well as iOS but i haven't checked yet)

Looks like you're right, QoS 0 requires no ACK. With QoS > 0 an ACK is expected. If we don't receive this ACK what are we supposed to do?

robhogan commented 6 years ago

Well, MQTT says we do nothing, the message is just silently lost. QoS=0 means that's not a problem to you.

We don't want that to be happening when we could be connected though, and we don't want to have our subscriptions interrupted without us knowing about it. My first thought was to inspect socket.readyState if socket.onclose isn't being called reliably, but I want to be careful about conflating connections at the socket layer with the those at the MQTT layer. Perhaps pinging the server is the way to go... I'll give it some thought.

MattyK14 commented 6 years ago

The disconnections on Android 8.0 likely have something to do with this? But then what causes the client to not see a disconnection?