Open bengtan opened 4 years ago
Background notes:
LocationStore/RNBGL significant changes:
4.30.0 uses RNBGL 3.3.0 (PR 4296) 4.32.0 uses RNBGL 3.3.2 (PR 4360) 4.32.1 has dynamic upload rate (4274) 4.32.2 has dynamic upload rate fixes (PR 4435)
There's two things we can do:
I'm gonna put in some more debugging so it's clearer when the phone's connectivity status changes. At the moment, it's not clear, and I don't 100% trust the lines in the RNBGL log which indicate connectivity changes.
In the meantime, it would be useful to know which version of the app this issue started happening.
@mstidham:
Presuming that this is not hard to reproduce, can you please test to see if this issue happens for the following versions:
@bengtan I was able to reproduce this on Staging with versions listed above and have sent you logs from the watcher and stuck user. I did however experience a little different results than previously reported. Samsung10 was stuck for an extended time and I opened the app on that device and everything updated and appeared fine. However Miranda did not see an update for Samsung10 on the iOS device, Miranda had app open entire time and was still in the same user session. Miranda then arrived back home connected to WiFi with no change and then did a kill/reload on the iOS device. Samsung10 updated and appeared fine after being stuck for over 1 hour. So basically according to the geolocation console there was never a location update when Samsung10 opened the app in Woodward, Oklahoma. A kill/reload was required by the watching user to receive the new location data points.
Accounts: Miranda (iOS) Version: 4.30.1 (watcher) & Samsung10 (android) Version: 4.30.2 (stuck user)
Samsung10 stuck point
Samsung10 unstuck point according to the console (the data points showing on console are actually before the kill/reload on iOS device).
This is the point Miranda did the kill/reload on iOS device
@mstidham:
Thanks for reproducing this and capturing it.
I was surprised it happened with 4.30.x. I was tentatively presuming that it wouldn't happen because 4.30.x is the 'previous' App Store version and we didn't encounter this issue back then. So it's notable that this issue existed with 4.30.x and possibly earlier.
Samsung10 was stuck for an extended time and I opened the app on that device and everything updated and appeared fine. However Miranda did not see an update for Samsung10 on the iOS device, Miranda had app open entire time and was still in the same user session. ... So basically according to the geolocation console there was never a location update when Samsung10 opened the app in Woodward, Oklahoma.
Yes, it is a bit different this time, but I don't know if it's significant or not.
I did see in the server side logs that the Samsung10 was opened (in Woodward, as you say?) and logged into the wocky server. However, for some reason, RNBGL didn't upload any location points whilst the app was open (This is a new symptom).
I'm in agreement with you as to what happened, though I'm not able to explain why.
A kill/reload was required by the watching user to receive the new location data points.
Or maybe, if you didn't do the kill/reload and waited a few more seconds (maybe 10 seconds?), the watching user might have eventually received the new location data points anyway?
(Maybe not important)
Anyway, I've prepared some extra debugging.
Could you please install codepush StagingBeng 4595-debug (fca871c8)
and test again, and then get me the logs?
Staging Version: 4.34.2 (4.34.2-StagingBeng-v124)
Logs have been sent from Watching and Stuck users. This time was the typical stuck situation. User Samsung10 (android) arrived at Home (WiFi) and then never left that location. I waited until I arrived in Woodward and did kill/reload on Miranda device with no change. I then opened the app with Samsung10 and everything updated on Watcher and Stuck users devices. A bugsnag was also generated near the time Samsung10 got stuck and I created ticket #4600.
Samsung10 stuck point
Samsung Unstuck point
@mstidham:
Thank you very much for the new debug log. This one clarifies things quite a bit, so much so that I now have enough evidence to go hassle the RNBGL developer.
I'll post an upstream RNBGL ticket and post the link here later.
In the meantime, some more comments ...
The following is not a bug or a question, it's just a FYI.
In the screenshot 'Samsung Unstuck point', you have annotated two red arrows pointing to 'Unstuck point according to geo console' (which I'll call Point B) and 'Actual location that Samsung opened the app' (which I'll call Point C).
The reason for the difference between the two points is as follows:
If the phone is offline, the app retains the 20 most recent data points. When it later goes online, it uploads all the offline data points.
What happens is that the app re-establishes connectivity at the location/time of Point C. When this happens, there are 20 offline data points, and they are (queued and) uploaded. Point B just happens to be the first/oldest of the offline data points.
If you look in the map, there are about 20 points/map-markers between B and C. I say 'about' because this is not very precise and the map markers could be off by 1 or 2.
(BTW, I like the newfound precision that we are now debugging with ie. identifying individual data points (and timestamps) when significant things happen.)
As mentioned above, I'm going to post an RNBGL ticket and wait for the developer's response.
In the meantime, could you do the following please?
Could you please test with an iPhone being the stuck user? I think this issue occurs with the iPhone too, and I'd like to verify. (We've been mostly testing with android being the stuck user's phone.)
Please use the same codepush so I can get the same level of debug information.
Upstream ticket:
Plugin does not retry/recover after network or connectivity error 941 https://github.com/transistorsoft/react-native-background-geolocation/issues/941
Progress!
The RNBGL maintainer concedes it's a (recent?) RNBGL bug.
I'll continue corresponding with him. (Maybe try to come up with a work-around?)
Diagnosis:
There was a recent feature/change whereby, if RNBGL attempts to upload location and the server doesn't respond (ie. the internet connection is down), RNBGL will back-off and try again 10 location data points later. However, the back-off is implemented by internally increasing autoSyncThreshold
by 10.
We are affected by this as follows:
maxRecordsToPersist
=20).autoSyncThreshold
becomes 30.Problem is ... maxRecordsToPersist
is 20 so the number of offline data points will never reach 30 so RNBGL never retries. (... Until an app restart or something which resets it's state machine.)
AFAICT, this bug was introduced in this commit: https://github.com/transistorsoft/react-native-background-geolocation-android/commit/b66c7fd5 and first released in RNBGL 3.3.0. This suggests it first appeared in our app at version 4.30.0.
Fix:
The RNBGL developer has implemented a fix and we'll have to try it out.
It is a native change, but I'm thinking of building it as 4.34.3 (since the native<->javascript interface hasn't changed) instead of 4.35.0.
@mstidham:
I've released tinyrobotStaging 4.34.3 with the supposed fix. This is an experimental version. Depending on how testing goes, we'll see what to do next.
Please test 4.34.3. It's probably easier to try it on android first.
Testing notes:
Please do whatever you've been doing for the last few days in your testing. However ...
If the phone/app has been stuck for 1 or 2 minutes, don't open the app yet. Give it time to auto-recover (which the fix is supposed to do).
Instead, keep moving around for the next 5-10 minutes. Preferably, head to a place where you know there is good cellular reception. (Alternatively, go to somewhere with wifi.)
After at least 5-10 minutes (and in a place with good cellular reception or wifi), if the app is still stuck, don't open the app yet.
Instead, open the phone and verify that the phone does have internet access by using some other app.
Then, after confirming that the phone does have internet access, then, open the app.
The rationale is as follows: The logs indicate that the phone does sometimes lose cellular connection for about 1 to 2 minutes. In which case, the app will be 'stuck' for 1 to 2 minutes. This is fine. It's a legitimate error and it's expected behaviour.
What is not expected is if ... the phone moves to a place with good cellular or wifi and doesn't eventually recover. (This is why I'd like you to wait at least 5-10 minutes before opening the app.)
If the app does eventually recover without you having to open it, that's a good thing.
More testing notes:
4.34.3 has a new (beta) version of RNBGL. No unexpected behaviour is expected, but please keep this in mind.
The ideal test result would be the app getting stuck for a minute or two due to the internet connection being down, and then auto-recovering soon after. But there's no easy way to see if that's the case without looking at the log file.
Also, after you're done testing, please send me the debug log regardless of whether you reproduce it or not. There may be some interesting stuff in there.
Cross-posting https://hippware.slack.com/archives/C2V6L53TQ/p1578538350018600?thread_ts=1578524758.005500&cid=C2V6L53TQ
This instance is fine. The app/RNBGL eventually recovered. I could see in the logs that it was continuing to retry uploading location until it eventually succeeded.
Cross-posting https://hippware.slack.com/archives/C2V6L53TQ/p1578538938018800?thread_ts=1578534892.012500&cid=C2V6L53TQ
This instance is fine too. I can see in the logs that RNBGL is continually trying to retry uploading location.
Yes, the app got 'stuck' for about 30 minutes, but during that 30 minutes, the phone OS indicated that the internet was connected, and RNBGL was attempting to upload location once every 10-40 seconds, but the attempts all failed (with timeout).
And it eventually recovered.
According to the logs, RNBGL recovered about 1 min 30 seconds before the app was re-opened. AFAICT, it recovered on its own.
(Or ... maybe RNBGL didn't recover on it's own but need something to trigger it (phone unlock? switch to wifi?) I don't see anything in the logs to indicate yay or nay.)
So ... there's some weirdness with why the server was unreachable for 30 mins on the cellular connection, but other than that, this isn't a reproduction of 'location uploading being stuck'.
Please note that 4.34.3 has the fix for this, but 4.35.0 does NOT.
Don't test this with 4.35.x until further notice.
cc @mstidham
QA has been reporting that, in testing, sometimes phones get 'stuck' where 'stuck' means: The app (ie. RNBGL) stops uploading location data points and stays that way until it is re-started/re-foregrounded.
The distinguishing characteristic seems to be that an app restart/foreground is needed in order to recover.
Analysis of RNBGL debug log files reveals that there are errors related to network/connectivity issues. However, this may not be the full picture because RNBGL should automatically recover once network connectivity is restored. Instead, it seems that RNBGL only recovers when the app is restarted/foregrounded.
This was first reported for Staging 4.34.0:
jakeallysmom (Staging) - 20191212 https://hippware.slack.com/archives/C2V6L53TQ/p1576159767303200
Other documented instances:
Pixel (Staging) and Samsung10 (Staging) - 20191219 https://hippware.slack.com/archives/C2V6L53TQ/p1576766408020500
Don (Staging) - 20191219 https://hippware.slack.com/archives/C2V6L53TQ/p1576780921022700