zulip / zulip-mobile

Zulip mobile apps for Android and iOS.
https://zulip.com/apps/
Apache License 2.0
1.29k stars 653 forks source link

Crash with "Active account not logged in" #3706

Open gnprice opened 4 years ago

gnprice commented 4 years ago

One recurring error we see in the crash logs looks like this:

com.facebook.react.common.JavascriptException: Error: Error: Error: Error: Active account not logged in
This error is located at:
in Connect(n)
in s
in RCTView
in RCTView
in u
[...]
in c, stack:
h@572:1357
<unknown>@1742:2080
w@388:786
run@372:3903
onStateChange@372:2816
<unknown>@-1
notify@376:741
notifyNestedSubs@376:395
<unknown>@-1
onStateChange@372:2971
[...]
at com.facebook.react.modules.core.ExceptionsManagerModule.showOrThrowError(ExceptionsManagerModule.java
  at com.facebook.react.modules.core.ExceptionsManagerModule.reportFatalException (ExceptionsManagerModule.java:38)
  at java.lang.reflect.Method.invoke (Native Method)
  at com.facebook.react.bridge.JavaMethodWrapper.invoke (JavaMethodWrapper.java:372)
[...]

E.g. it's one of the types of errors in this "cluster" on the Play Console.

The error message comes from getAuth:

/**
 * The auth object for the active, logged-in account; throws if none.
 *
 * For use in all the normal-use screens and codepaths of the app, which
 * assume there is an active, logged-in account.
 *
 * See:
 *  * `tryGetAuth` for the meaning of "active, logged-in".
 *  * `tryGetAuth` again, for use where there might not be such an account.
 */
export const getAuth = (state: GlobalState): Auth => {

So apparently some code is assuming there's an active, logged-in account when there isn't. Fixing the bug means finding where that code is, and either making it handle the alternative case (e.g. with tryGetAuth) or preventing getting there in that case. Probably the first step is to repro it.

In this Sentry record, it looks from the breadcrumbs like the user was using one account, and tried to either switch to another or add another (there's a /server_settings call for one org after a bunch of activity in a different one); and then this hit 10 seconds later. So that may be a clue.

Here's a Sentry search for more examples, which may provide further clues.

rk-for-zulip commented 4 years ago

Filtering the Sentry records to include only iOS results (for which we have coherent stack traces) suggests that this is happening in multiple different places, including but probably not limited to:

gnprice commented 4 years ago

Fascinating, thanks!

Do you have links to those records in Sentry? (Naturally they'll work only for members of the core team with Sentry access.) I think that'd be helpful for seeing the context of your observations.

gnprice commented 4 years ago

Looking at a couple of these in more detail:

  • dispatch(logout(...)) in fetchActions.js (3 occurrences)

This is particularly interesting because logout() itself is totally trivial (returns the object literal {type: 'LOGOUT'}, so it must be something called by the dispatch. The reducers aren't going to call selectors like getAuth... perhaps it's some React method invoked via react-redux? I don't feel like I actually have a complete handle on what might get called by dispatch (and what might get put on a queue, so that dispatch causes it to run but is gone from the stack before it does).


  • from a setTimeout callback in handleTopicPress in UnreadCards.js

That'd be this:

  handleTopicPress = (stream: string, topic: string) => {
    setTimeout(() => this.props.dispatch(doNarrow(topicNarrow(stream, topic))));
  };

topicNarrow is trivial; as is doNarrow itself, but it returns a thunk action, so when passed to dispatch some code runs:

export const doNarrow = (narrow: Narrow, anchor: number = FIRST_UNREAD_ANCHOR) => (
  dispatch: Dispatch,
  getState: GetState,
) => {
  const state = getState();

  if (!isNarrowValid(narrow)(state) || !getIsHydrated(state)) {
    return;
  }

  dispatch({ type: DO_NARROW, narrow });
  dispatch(fetchMessagesInNarrow(narrow, anchor));
  dispatch(navigateToChat(narrow));
};

Implicitly that's all assuming that we're still logged into the same account that the narrow value is intended for. Which we would surely be, if not for a delay like that setTimeout. It's got to be a tight race -- there's no delay on that setTimeout -- but it sure is a race.

Thinking about this a bit, I think the root issue here is in that handleTopicPress method -- it sets a timeout callback, creating a yield point, and it implicitly assumes that nothing important changes while it's yielded.

The timeouts were inserted in 31d83eb8154f, to fix #2794:

unreads screen: Don't block UI when pressing a topic or stream.

This would typically cause us to completely lose the visual feedback
we should show for the touch.

So maybe the root root cause is that something inside doNarrow is synchronous that should be async. The timeout was basically a crudely-targeted way of making it async.

gnprice commented 4 years ago

I've filed #4135 to describe one of the bugs in this cluster, for which we have a reliable repro and a specific tentative diagnosis.