Open pulkonet opened 6 years ago
Same behaviour is shown by all the chat screens.
Great find! Working on that issue is a priority. If you want to try to figure out how to fix it, you are more than welcome!
@zulipbot claim
@zulipbot claim
@roberthoenig not sure how involved would be to implement this... or more correctly I suspect it will require a lot of rethinking and reworking. Still, not a bad idea to examine the issue and share your thoughts.
I've done some heavier debugging on this. Here are the results:
To load the data, our code call the React Native fetch
API to make a GET request to the Zulip server. If the user disconnected and reconnected, this request never makes it out of the RN app. The issue probably lies within RN itself, specifically its networking code. To further debug this, we will need to get a source compilation of RN running to observe what part is failing inside.
This issue can be deterministically reproduced on a development server with the following steps:
If the app was behaving correctly, it would load the new messages from the stream #test and show them.
Here are the important things going on in the app when following the steps above:
Disconnect your phone from the internet.
An action
action {type: "APP_ONLINE", isOnline: false}
is triggered.
Connect your phone to the internet.
An action
action {type: "APP_ONLINE", isOnline: true}
is triggered.
On your phone, try to narrow to the stream #test.
The action SWITCH_NARROW
is triggered, followed by the action MESSAGE_FETCH_START
.
The app is now trying to fetch the messages from the server. This happens in fetchApi.js
.
The function calls roughly go apiGet
-> apiCall
-> apiFetch
-> fetchWithAuth
-> fetch
.
The last function fetch
is part of React Native's Fetch API. It is supposed to mimic the Javascript Fetch API.
The fetch
call returns a promise that never gets resolved (or, more precisely, only after a long time, as we will discover later). This is why the stream won't show up.
If we try the same thing without previously dis- and reconnecting, the fetch
call will resolve as expected
and return a list of new messages. The action MESSAGE_FETCH_COMPLETE
is triggered.
To debug this issue, we need to monitor the activity on the app, the server, and the route in between.
To monitor the app network request I chose to print-debug all calls to fetch
. I did this in two ways:
Adding console.log
statements around the fetch
call in fetchAuth
. This reliably showed the
arguments that fetch
is being called with.
Add the following code at the beginning of ZulipMobile.js
to overwrite fetch
:
global._fetch = fetch;
global.fetch = function (uri, options, ...args) {
return global._fetch(uri, options, ...args).then((response) => {
console.log('Fetch', { request: { uri, options, ...args }, response });
return response;
});
};
This sometimes crashed for me and didn't give me any new data, so I removed it.
A third approach was outlined in this post. Basically, just add
XMLHttpRequest = GLOBAL.originalXMLHttpRequest ?
GLOBAL.originalXMLHttpRequest : GLOBAL.XMLHttpRequest;
at the beginning of ZulipMobile.js
. However, this crashed for me, so I removed it and don't know what
it'd tell me.
For monitoring the dev server, I just looked at the logs it is constantly spitting out. They look like this:
2018-06-06 10:08:21.156 INFO [zr] 10.0.0.4 POST 200 20ms (db: 3ms/4q) /json/users/me/presence (iago@zulip.com via website) 2018-06-06 10:08:33.153 INFO [zr] 10.0.0.4 POST 200 18ms (db: 3ms/4q) /json/users/me/presence (iago@zulip.com via website)
To see if any traffic is silently sneaking past my eyes, I used Wireshark to monitor all traffic going in and out of my system. Both the app and my dev server are hosted on my computer, so traffic between them goes over the Loopback Interface. To monitor the Looopback Interface-, I started Wireshark with
sudo wireshark -i lo
There is a lot of traffic going on, even on the Loopback Interface. Therefore, I filtered all traffic to see
only http
and http2
packets.
Let's reproduce the issue again, but this time with all the monitoring we just discussed.
A few milliseconds after the MESSAGE_FETCH_START
action is triggered, fetch(url, allParams)
is called with the following parameters:
allParams
:
{headers: {Authorization:"Basic Y29yZGVsaWFAenVsaXAuY29tOjY4ZTlTSGJjNHF2QU9oNkd3VllmT1EzUFRaWWdGRnBO"
Content-Type: "application/x-www-form-urlencoded; charset=utf-8"
User-Agent: "ZulipMobile/1.0.13 (Android 6.0)"},
method: "get"}
It returns an empty promise.
At the same time, on the server side, the request that got sent by the app does not appear in the logs. The only request that does appear is
2018-06-06 10:26:08.245 INFO [zr] 10.0.0.4 GET 404 27ms (db: 3ms/2q) /favicon.ico (unauth via ?)
2018-06-06 10:26:08.245 WARN [django.server] "GET /favicon.ico HTTP/1.1" 404 24995
This appears also when the app hasn't previously been dis- and reconnected. Furthermore, this request does not appear in the app logs.
On Wireshark, the request that got sent by the app does not appear in the logs. Again, only the 404 favicon.ico
request is shown.
Then, nothing interesting happens for a while. After 2-15 minutes, however, the app request is logged. The
server handles it normally, returning the requested messages. MESSAGE_FETCH_COMPLETE
is triggered and the
app shows the requested stream. The promise returned by fetch
resolves into a response:
After the above detailed observation of the request flow, I went on trying out some things:
fetch
-ing another URL.I added another fetch
call directly before the original one to fetch some definitely available resource:
let response = fetch("https://facebook.github.io/react-native/movies.json").catch(() => console.log("errrrrrrror"));
Just as the original fetch call, this fetch call does not get resolved until 2-15 minutes have passed. When it gets resolved, it gets resolved at the same time.
I played with the HTTP caching options and changed it to the following values as described in this post:
This had no effect on the issue.
This had no effect on the issue. fetch
and all third pary AJAX libraries build on XMLHttpRequest,
so switching to any of those will probably not do anything. A document that might be of interest
here is the XMLHttpRequest standard.
This lead to all messages being loaded straight away!!. Inside the app, what happened
is just that every action got triggered again. I then tried
to just manually duplicate every fetch
call, without success (the issue remained).
It's useful to know what connection state the RN app is in after dis- and reconnecting. To find out,
I used the RN NetInfo object
. The results
weren't interesting. Before dis- and reconnecting, the app is in the state
type: cellular, effectiveType: 4g
On disconnect, the state changes to
type: none, effectiveType: unknown
On reconnecting, the state changes back
type: cellular, effectiveType: 4g
Note: To disconnect and reconnect the app on my Android emulator, I use the commands
adb shell svc data disable
and adb shell svc data enable
.
There weren't any. The closest was https://github.com/facebook/react-native/issues/6679.
It looks like the issue lies within the RN networking implementation: After dis- and reconnecting
and calling fetch
, it looks like the RN code is stuck internally for a while, until it manages
to send out the request after several minutes.
The next step in debugging this would be to get a source compilation of RN running to observe what part is failing inside. A good entry point into the code is the send function in XMLHttpRequest.js
. I think @gnprice is currently examining how to build RN from the source code.
I'm running v13.4.88
on my iPhone and couldn't reproduce this issue.
It looks like the issue lies within the RN networking implementation:
Maybe first check how it behaves in other RN based chat apps. This works fine for me in other apps.
I'm running v13.4.88 on my iPhone and couldn't reproduce this issue.
Interesting! This actually brings facebook/react-native#6679 close to attention; users there report that it works on Android but not iOS.
Also, I should have made it more clear that all the debugging I've done was on an Android emulator.
^^ see the above issue report on React Native.
Quoting my comment from https://github.com/zulip/zulip-mobile/issues/3800#issuecomment-589882875:
I'm curious whether we might still be affected by the upstream issues with
fetch
that surfaced in https://github.com/zulip/react-native/pull/2 (June 2018). That's a PR to our fork of react-native, which we'd actually abandoned in 4b95b124bf2b2fa4c7698c3a013cebffb3d43c57 (March 2017), in favor of using the published react-native. So, apparently the issues addressed in that PR were serious enough that we considered going back to our react-native fork after almost a year without it.The PR is marked as fixing https://github.com/zulip/zulip-mobile/issues/2287, https://github.com/zulip/zulip-mobile/issues/2310, and https://github.com/zulip/zulip-mobile/issues/2315, and all three are still open and without activity since 2018.
@roberthoenig and @gnprice did a ton of really valuable debugging work that I haven't fully absorbed, but I'll look deeper if you'd like me to.
Robert's facebook/react-native issue, https://github.com/facebook/react-native/issues/19709, is still open, and people are commenting to un-stale it.
Robert's square/okhttp issue, https://github.com/square/okhttp/issues/4079, is also still open, and Robert has commented that the issue may be in okhttp, not react-native. An okhttp maintainer first commented that it might be an Android bug, but a subsequent comment from the same maintainer seemed to suggest a solution exists within okhttp.
Steps of reproduction: