parse-community / ParseLiveQuery-Android

Parse LiveQuery client for Android.
Other
84 stars 32 forks source link

Multiple subscribes before client connects leads to unstable websocket clients #37

Closed jhansche closed 7 years ago

jhansche commented 7 years ago

Starting in 3b0041dc20c7bc87c453e8e2e74f3cbe856d40b6, the following sequence of events leads to the websocket sending duplicate subscribe events:

  1. client = getClient(..)
  2. client.subscribe(query1)
  3. client.subscribe(query2)

After (1), the client is not yet connected. During (2), the subscription becomes "pending", and a call to reconnect() is started. As of that point in time, the socket is still not connected. During (3), the subscription is still added to the pending subscriptions list, but because the reconnect() from (2) is actually asynchronous, it probably hasn't started yet (or may be partially started), so it also tries to call reconnect() again.

That leads to the first reconnect call (2) becoming useless because the (3) call disconnects and reconnects it as well.

However, the race condition leads to more trouble, as the connection attempt of (2) still succeeds, and therefore we still attempt to send the pending subscriptions. Then the connection attempt of (3) also succeeds, and the pending subscriptions are sent again.

We should be able to use the state of the websocket client to determine whether we can start connecting immediately, or we need to initiate the process of disconnecting (or in some cases, we may not need to disconnect/reconnect at all -- i.e., if we're in the CONNECTING state, we should probably just wait for that to finish connecting (or fail))

mmimeault commented 7 years ago

Good catch, it's totally a big issue. We need to fix that ASAP, it will cause trouble on most systems.

jhansche commented 7 years ago

I think the bigger issue is the linked change where I made disconnect() async.

But there are also some other assumptions made and corners cut. I think the issues will actually be resolved when we simplify the roles of each method.

jhansche commented 7 years ago

In fact, now that we're using OkHttp for websockets, I think we can revert that disconnectAsync() change, and that might solve the issues.

rogerhu commented 7 years ago

Did you find that you needed to put disconnect on a separate thread after OkHttp?

jhansche commented 7 years ago

No, that change was before the OkHttp change merged. And I'm not even sure how to reproduce the issue, just saw it in one of our betas via crashlytics.

But the problem is that close() by itself (at least on tube sock) tries to send a close negotiation to the server before disconnecting. So doing that on the main thread results in a NetworkOnMainThreadException.

OkHttp likely would not have that issue, because ALL network traffic is handled on a separate thread for the OkHttpClient. I'll try some more tests with the async-disconnect commit reverted, and see if that fixes the weirdness I was seeing.

jhansche commented 7 years ago

I've confirmed that with OkHttpClient, we no longer need the disconnectAsync() change. I'll be submitting a PR to revert that change.