zulip / zulip-mobile

Zulip mobile apps for Android and iOS.
https://zulip.com/apps/
Apache License 2.0
1.29k stars 651 forks source link

Data doesn't load if the network is turned on after coming to the chat view (private chat) #2287

Open pulkonet opened 6 years ago

pulkonet commented 6 years ago

Steps of reproduction:

pulkonet commented 6 years ago

Same behaviour is shown by all the chat screens.

borisyankov commented 6 years ago

Great find! Working on that issue is a priority. If you want to try to figure out how to fix it, you are more than welcome!

pulkonet commented 6 years ago

@zulipbot claim

roberthoenig commented 6 years ago

@zulipbot claim

borisyankov commented 6 years ago

@roberthoenig not sure how involved would be to implement this... or more correctly I suspect it will require a lot of rethinking and reworking. Still, not a bad idea to examine the issue and share your thoughts.

roberthoenig commented 6 years ago

I've done some heavier debugging on this. Here are the results:

TL;DR:

To load the data, our code call the React Native fetch API to make a GET request to the Zulip server. If the user disconnected and reconnected, this request never makes it out of the RN app. The issue probably lies within RN itself, specifically its networking code. To further debug this, we will need to get a source compilation of RN running to observe what part is failing inside.

Full version

Reproduction

This issue can be deterministically reproduced on a development server with the following steps:

  1. Start the dev server.
  2. On your computer, connect to the dev server and log in with hamlet.
  3. On your phone, connect to the dev server and log in with cordelia.
  4. Disconnect your phone from the internet.
  5. On your computer, send some messages to the stream #test.
  6. Connect your phone to the internet.
  7. On your phone, try to narrow to the stream #test.
  8. Bug: The stream #test won't load. Instead, it will show a stream skeleton like in the picture of the bug report.

If the app was behaving correctly, it would load the new messages from the stream #test and show them.

What's happening?

Here are the important things going on in the app when following the steps above:

Disconnect your phone from the internet.

An action

action     {type: "APP_ONLINE", isOnline: false}

is triggered.

Connect your phone to the internet.

An action

action     {type: "APP_ONLINE", isOnline: true}

is triggered.

On your phone, try to narrow to the stream #test.

The action SWITCH_NARROW is triggered, followed by the action MESSAGE_FETCH_START. The app is now trying to fetch the messages from the server. This happens in fetchApi.js. The function calls roughly go apiGet -> apiCall -> apiFetch -> fetchWithAuth -> fetch. The last function fetch is part of React Native's Fetch API. It is supposed to mimic the Javascript Fetch API.

The fetch call returns a promise that never gets resolved (or, more precisely, only after a long time, as we will discover later). This is why the stream won't show up.

If we try the same thing without previously dis- and reconnecting, the fetch call will resolve as expected and return a list of new messages. The action MESSAGE_FETCH_COMPLETE is triggered.

Monitoring

To debug this issue, we need to monitor the activity on the app, the server, and the route in between.

The app

To monitor the app network request I chose to print-debug all calls to fetch. I did this in two ways:

The dev server

For monitoring the dev server, I just looked at the logs it is constantly spitting out. They look like this:

2018-06-06 10:08:21.156 INFO [zr] 10.0.0.4 POST 200 20ms (db: 3ms/4q) /json/users/me/presence (iago@zulip.com via website) 2018-06-06 10:08:33.153 INFO [zr] 10.0.0.4 POST 200 18ms (db: 3ms/4q) /json/users/me/presence (iago@zulip.com via website)

The route in between

To see if any traffic is silently sneaking past my eyes, I used Wireshark to monitor all traffic going in and out of my system. Both the app and my dev server are hosted on my computer, so traffic between them goes over the Loopback Interface. To monitor the Looopback Interface-, I started Wireshark with

sudo wireshark -i lo

There is a lot of traffic going on, even on the Loopback Interface. Therefore, I filtered all traffic to see only http and http2 packets.

Debugging

Let's reproduce the issue again, but this time with all the monitoring we just discussed. A few milliseconds after the MESSAGE_FETCH_START action is triggered, fetch(url, allParams) is called with the following parameters:

It returns an empty promise.

At the same time, on the server side, the request that got sent by the app does not appear in the logs. The only request that does appear is

2018-06-06 10:26:08.245 INFO [zr] 10.0.0.4        GET     404  27ms (db: 3ms/2q) /favicon.ico (unauth via ?)
2018-06-06 10:26:08.245 WARN [django.server] "GET /favicon.ico HTTP/1.1" 404 24995

This appears also when the app hasn't previously been dis- and reconnected. Furthermore, this request does not appear in the app logs.

On Wireshark, the request that got sent by the app does not appear in the logs. Again, only the 404 favicon.ico request is shown.

Then, nothing interesting happens for a while. After 2-15 minutes, however, the app request is logged. The server handles it normally, returning the requested messages. MESSAGE_FETCH_COMPLETE is triggered and the app shows the requested stream. The promise returned by fetch resolves into a response:

Miscellaneous trial & error

After the above detailed observation of the request flow, I went on trying out some things:

fetch-ing another URL.

I added another fetch call directly before the original one to fetch some definitely available resource:

let response = fetch("https://facebook.github.io/react-native/movies.json").catch(() => console.log("errrrrrrror"));

Just as the original fetch call, this fetch call does not get resolved until 2-15 minutes have passed. When it gets resolved, it gets resolved at the same time.

Caching

I played with the HTTP caching options and changed it to the following values as described in this post:

This had no effect on the issue.

Using XMLHttpRequest instead of fetch

This had no effect on the issue. fetch and all third pary AJAX libraries build on XMLHttpRequest, so switching to any of those will probably not do anything. A document that might be of interest here is the XMLHttpRequest standard.

Exiting the stream and reopening it.

This lead to all messages being loaded straight away!!. Inside the app, what happened is just that every action got triggered again. I then tried to just manually duplicate every fetch call, without success (the issue remained).

Debugging the network status

It's useful to know what connection state the RN app is in after dis- and reconnecting. To find out, I used the RN NetInfo object. The results weren't interesting. Before dis- and reconnecting, the app is in the state

type: cellular, effectiveType: 4g

On disconnect, the state changes to

type: none, effectiveType: unknown

On reconnecting, the state changes back

type: cellular, effectiveType: 4g

Note: To disconnect and reconnect the app on my Android emulator, I use the commands adb shell svc data disable and adb shell svc data enable.

Looking for similar issue reports in React Native.

There weren't any. The closest was https://github.com/facebook/react-native/issues/6679.

Conclusion

It looks like the issue lies within the RN networking implementation: After dis- and reconnecting and calling fetch, it looks like the RN code is stuck internally for a while, until it manages to send out the request after several minutes.

The next step in debugging this would be to get a source compilation of RN running to observe what part is failing inside. A good entry point into the code is the send function in XMLHttpRequest.js. I think @gnprice is currently examining how to build RN from the source code.

akashnimare commented 6 years ago

I'm running v13.4.88 on my iPhone and couldn't reproduce this issue.

It looks like the issue lies within the RN networking implementation:

Maybe first check how it behaves in other RN based chat apps. This works fine for me in other apps.

roberthoenig commented 6 years ago

I'm running v13.4.88 on my iPhone and couldn't reproduce this issue.

Interesting! This actually brings facebook/react-native#6679 close to attention; users there report that it works on Android but not iOS.

roberthoenig commented 6 years ago

Also, I should have made it more clear that all the debugging I've done was on an Android emulator.

roberthoenig commented 6 years ago

^^ see the above issue report on React Native.

chrisbobbe commented 4 years ago

Quoting my comment from https://github.com/zulip/zulip-mobile/issues/3800#issuecomment-589882875:

I'm curious whether we might still be affected by the upstream issues with fetch that surfaced in https://github.com/zulip/react-native/pull/2 (June 2018). That's a PR to our fork of react-native, which we'd actually abandoned in 4b95b124bf2b2fa4c7698c3a013cebffb3d43c57 (March 2017), in favor of using the published react-native. So, apparently the issues addressed in that PR were serious enough that we considered going back to our react-native fork after almost a year without it.

The PR is marked as fixing https://github.com/zulip/zulip-mobile/issues/2287, https://github.com/zulip/zulip-mobile/issues/2310, and https://github.com/zulip/zulip-mobile/issues/2315, and all three are still open and without activity since 2018.

@roberthoenig and @gnprice did a ton of really valuable debugging work that I haven't fully absorbed, but I'll look deeper if you'd like me to.

Robert's facebook/react-native issue, https://github.com/facebook/react-native/issues/19709, is still open, and people are commenting to un-stale it.

Robert's square/okhttp issue, https://github.com/square/okhttp/issues/4079, is also still open, and Robert has commented that the issue may be in okhttp, not react-native. An okhttp maintainer first commented that it might be an Android bug, but a subsequent comment from the same maintainer seemed to suggest a solution exists within okhttp.