Open gnprice opened 4 years ago
Filtering the Sentry records to include only iOS results (for which we have coherent stack traces) suggests that this is happening in multiple different places, including but probably not limited to:
dispatch(logout(...))
in fetchActions.js
(3 occurrences)connect
ing the StreamListCard
after...
handleOrientationChange
in AppBootHandlers.js
)handleAppStateChange
in AppBootHandlers.js
)doInitialFetch
resulting from a BAD_EVENT_QUEUE_ID
(v. 26.15.138)setTimeout
callback in handleTopicPress
in UnreadCards.js
Fascinating, thanks!
Do you have links to those records in Sentry? (Naturally they'll work only for members of the core team with Sentry access.) I think that'd be helpful for seeing the context of your observations.
Looking at a couple of these in more detail:
dispatch(logout(...))
infetchActions.js
(3 occurrences)
This is particularly interesting because logout()
itself is totally trivial (returns the object literal {type: 'LOGOUT'}
, so it must be something called by the dispatch
. The reducers aren't going to call selectors like getAuth
... perhaps it's some React method invoked via react-redux
? I don't feel like I actually have a complete handle on what might get called by dispatch
(and what might get put on a queue, so that dispatch
causes it to run but is gone from the stack before it does).
- from a
setTimeout
callback inhandleTopicPress
inUnreadCards.js
That'd be this:
handleTopicPress = (stream: string, topic: string) => {
setTimeout(() => this.props.dispatch(doNarrow(topicNarrow(stream, topic))));
};
topicNarrow
is trivial; as is doNarrow
itself, but it returns a thunk action, so when passed to dispatch
some code runs:
export const doNarrow = (narrow: Narrow, anchor: number = FIRST_UNREAD_ANCHOR) => (
dispatch: Dispatch,
getState: GetState,
) => {
const state = getState();
if (!isNarrowValid(narrow)(state) || !getIsHydrated(state)) {
return;
}
dispatch({ type: DO_NARROW, narrow });
dispatch(fetchMessagesInNarrow(narrow, anchor));
dispatch(navigateToChat(narrow));
};
Implicitly that's all assuming that we're still logged into the same account that the narrow
value is intended for. Which we would surely be, if not for a delay like that setTimeout
. It's got to be a tight race -- there's no delay on that setTimeout
-- but it sure is a race.
Thinking about this a bit, I think the root issue here is in that handleTopicPress
method -- it sets a timeout callback, creating a yield point, and it implicitly assumes that nothing important changes while it's yielded.
The timeouts were inserted in 31d83eb8154f, to fix #2794:
unreads screen: Don't block UI when pressing a topic or stream.
This would typically cause us to completely lose the visual feedback
we should show for the touch.
So maybe the root root cause is that something inside doNarrow
is synchronous that should be async. The timeout was basically a crudely-targeted way of making it async.
I've filed #4135 to describe one of the bugs in this cluster, for which we have a reliable repro and a specific tentative diagnosis.
One recurring error we see in the crash logs looks like this:
E.g. it's one of the types of errors in this "cluster" on the Play Console.
The error message comes from
getAuth
:So apparently some code is assuming there's an active, logged-in account when there isn't. Fixing the bug means finding where that code is, and either making it handle the alternative case (e.g. with
tryGetAuth
) or preventing getting there in that case. Probably the first step is to repro it.In this Sentry record, it looks from the breadcrumbs like the user was using one account, and tried to either switch to another or add another (there's a
/server_settings
call for one org after a bunch of activity in a different one); and then this hit 10 seconds later. So that may be a clue.Here's a Sentry search for more examples, which may provide further clues.