launchdarkly / android-client-sdk

LaunchDarkly Client-side SDK for Android
Other
45 stars 23 forks source link

Polling alarms can repeat in background indefinitely if app crashes #188

Closed byencho closed 1 year ago

byencho commented 1 year ago

Is this a support request? No

Describe the bug Under certain conditions (described below) the alarm set for foreground polling is not cancelled when the app is closed and this can result in that polling period being used to wake the app up in the background.

To reproduce There is a particular setup required here:

When the app is first foregrounded, the PollingUpdater will register a recurring alarm using the pollingIntervalMillis (which in this case is 5 minutes). Under normal circumstances, when the app moves to the background this alarm will be cancelled (as expected). This is because the ForegroundChecker will trigger onBecameBackground inside ConnectivityManager:

            @Override
            public void onBecameBackground() {
                synchronized (ConnectivityManager.this) {
                    if (isInternetConnected(application) && !setOffline &&
                            connectionInformation.getConnectionMode() != backgroundMode) {
                        throttler.cancel();
                        attemptTransition(backgroundMode);
                    }
                }
            }

The attemptTransition(backgroundMode) will then disable all polling since background polling is disabled:

    private synchronized void attemptTransition(ConnectionMode nextState) {
        ...
        switch (nextState) {
            case SHUTDOWN:
            case BACKGROUND_DISABLED:
            case SET_OFFLINE:
            case OFFLINE:
                initialized = true;
                callInitCallback();
                stopPolling(); // <------------- Here
                stopStreaming();
                break;

However, none of this code runs if the app shuts down in the foreground unexpectedly for any reason (such as an app crash). This leaves the 5 minute alarm in place and it wakes the app back up in the background. However, because the process is running in the background and is initialized in "offline" mode, the alarm also is not cancelled and is therefore allowed to repeat. Alarms are always typically cancelled when the SDK is initialized when ConnectivityManager.startUp is called, but this logic is not run in offline mode:

    synchronized boolean startUp(LDUtil.ResultCallback<Void> onCompleteListener) {
        initialized = false;
        if (setOffline) {
            initialized = true;
            updateConnectionMode(ConnectionMode.SET_OFFLINE);
            voidSuccess(onCompleteListener);
            return false; // <------ Stopped here in offline mode
        }

        boolean connected = isInternetConnected(application);

        if (!connected) {
            initialized = true;
            updateConnectionMode(ConnectionMode.OFFLINE);
            voidSuccess(onCompleteListener);
            return false;
        }

        initCallback = onCompleteListener;
        eventProcessor.start();
        throttler.attemptRun(); // <------- This would reset the alarms if allowed to run.
        return true;
    }

Expected behavior Polling alarms are always cancelled and reset each time the SDK is initialized (even in offline mode).

More generally, if alarms were not used to implement foreground polling then it would not be possible for foreground polling to accidentally trigger an app to wake up in the background in the first place.

Logs N/A

SDK version 3.1.4 (and also confirmed with current version 3.2.0)

Language version, developer tools Kotlin / Java

OS/platform Android 13

Additional context N/A

eli-darkly commented 1 year ago

Thanks for reporting this. The description is clear, and we'll be looking into it. (filed internally as 169505)

eli-darkly commented 1 year ago

More generally, if alarms were not used to implement foreground polling then it would not be possible for foreground polling to accidentally trigger an app to wake up in the background in the first place

Using AlarmManager is definitely not ideal. It's possible that the reason we've been using that, rather than WorkManager (which seems to me like a better fit for this), is simply that the core Android SDK code is pretty old and we hadn't revisited that functionality since WorkManager became available. I do think we have made an effort to avoid relying on things that come from Jetpack rather than the core API, on the general principle of minimizing dependencies that an app might not want or that might conflict with the version the app is using, but that might not be so bad if WorkManager is not very big and if we're only using basic APIs from it that have been stable.

byencho commented 1 year ago

I do wonder, though, if WorkManager would potentially have the same problem here : work scheduled while the app runs in the foreground might not be properly cancelled under certain conditions if it requires a hook into when the app is backgrounded, resulting in an accidental app wakeup to handle "foreground polling". Just out of curiosity, could foreground polling just be handled by code that only runs locally in the app process (like just some simple loop on a background thread)? This would make foreground and background polling a little different, but there would be no risk of failing to cancel something the OS is managing.

eli-darkly commented 1 year ago

The possibility that this was all over-engineered from the start, and could be replaced by simpler Java scheduling mechanisms, is certainly something we need to consider. Since again the core SDK code is quite old and has been a bit neglected as development focused on other feature enhancements, it could be that we didn't completely think this through the first time round. I'm not aware of any OS limitations that would prevent a task from doing network requests in the background if it were just running on an ordinary worker thread, but maybe I'm missing something.

byencho commented 1 year ago

Well I think the AlarmManager stuff still makes sense for background polling because you need some way to wake the app up to perform the periodic requests. This edge case just seems to come in with treating foreground and background polling the same. The foreground polling doesn't require the app to be woken up (since it must be running already to be foregrounded) so I would think the polling for this could be done in a more direct manner that doesn't involve sending Intents to the OS and would therefore not be subject to any bugs related with failing to cancel the alarm.

eli-darkly commented 1 year ago

Well I think the AlarmManager stuff still makes sense for background polling because you need some way to wake the app up to perform the periodic requests

Is that really the case, though? That was my general impression, but... for instance, the Android "Guide to background work" describes three strategies for executing code while the app is in the background: WorkManager, AlarmManager, and just using an ordinary Java worker thread or Kotlin coroutine. And then it breaks things down further in terms of whether you need the scheduling to persist across app restarts and device restarts or not, and says that if you do, then you need to use WorkManager or AlarmManager (with WorkManager always being preferable unless you actually need to wake the whole device up from sleep). But it does seem to be saying that if you don't need it to persist across app/device restarts, then ordinary worker thread scheduling is OK. At least, I don't know how else to interpret the fact that they're discussing that as an option, on a page that is entirely about background work.

byencho commented 1 year ago

Well this all comes down to what you mean by "background" here and they are sorting lumping a couple of things together:

  1. When the app process is running but there is no foreground UI the user is interacting with.
  2. When the app process is not running at all.

You can use worker threads for "background" work in the first case, but you need something like WorkManager or AlarmManager in the second because you need your app process to start running again before anything can happen. That is all wrapped up in the idea of persisting across app restarts. So then it all just comes down to what you are trying to achieve with "background polling" in the SDK. From the looks of the code and documentation, you appear to want to solve both cases. So you still need something like AlarmManager / WorkManager. And it does sound like they are recommending WorkManager over AlarmManager so that might be something you'd want to look into.

This is all very different for normal polling in the foreground, though. By definition the app process is already running and visible to the user, so normal worker threads would be sufficient there.

eli-darkly commented 1 year ago

I don't think the intention with "background polling" in the SDK is to keep doing it when the app is not running at all. If the app gets started again, the SDK is going to get started again and will acquire flags again at that time; having woken the app up before that point to do background polls wouldn't have made the flags any fresher. The SDK doesn't do anything with flag values unless the application actually requests those values. The point of background polling would be to make not-entirely-stale flag values available to the application in case the application is actually doing things while it's in the background.

And, the intended normal usage of the SDK is that the application explicitly shuts it down if the application knows it is quitting, so the only time polling would persist would be after a crash, which seems even less useful.

The reason I'm having to guess about the original rationale, rather than being able to say "we originally did it this way because ____", is that again this core logic is from very early iterations of the SDK project, when it was basically at proof of concept stage and LD did not yet have a large mobile user base, by developers who were not part of the project later on. I think it's entirely possible that some of the decisions they made weren't as fully thought through as they would be if we created the SDK today, and weren't revisited during later rewrites; either way, they weren't well documented. We've been doing a larger rewrite of the SDK as part of an upcoming feature release, and trying to ensure that we're really doing things in optimal ways.

By the way, when you say "from the looks of the code and documentation"— is there a particular piece of documentation you can point out that was giving you that impression? I'd like to make sure that what we're saying in docs is really in line with the actual behavior of the SDK.

byencho commented 1 year ago

@eli-darkly I suppose the documentation is a bit ambiguous because it just says "background" without specifically defining it (similar to how the Google docs use it in two different ways):

        /**
         * Sets how often the client will poll for flag updates when your application is in the background.
         * <p>
         * The default value is {@link LDConfig#DEFAULT_BACKGROUND_POLLING_INTERVAL_MILLIS}.
         *
         * @param backgroundPollingIntervalMillis the feature flag polling interval when in the background,
         *                                        in milliseconds
         * @return the builder
         */
        public LDConfig.Builder backgroundPollingIntervalMillis(int backgroundPollingIntervalMillis) {
            this.backgroundPollingIntervalMillis = backgroundPollingIntervalMillis;
            return this;
        }

I think looking at the code, though, that fact that AlarmManager was used and that default minimum background polling period was 15 minutes led me to believe that the intention of "background polling" was to wake the app up and fetch new values even if the app wasn't running.

If that is not the intention of background polling and that was only meant for when the app was actually running, then there is an additional bug here that needs fixing as well: the PendingIntent for the alarm for background polling is basically never cancelled right? So if you background the app, background polling is set up using the AlarmManager. Let's say you are using the default value of 15 minutes. Then whenever you or the OS kills your app, your app will keep being woken up by the AlarmManager every 15 minutes.

eli-darkly commented 1 year ago

the PendingIntent for the alarm for background polling is basically never cancelled right?

It is cancelled here... assuming, again, an orderly shutdown where the app has a chance to call LDClient.close(). If the app dies without doing that, then indeed the alarm won't be cancelled... which, as I understood it, was exactly the bug you were talking about in the first place, and I entirely agree that that's undesirable.

byencho commented 1 year ago

@eli-darkly So I know that alarm is cancelled when moving into the background, but I don't know of any other "orderly shutdown" hooks once its already in the background. So I'm not sure when the background alarm would ever be cancelled. So yes, this would produce a bug like the one I originally mentioned. But in this case it's far more predictable: for my problem, background polling is disabled and there is an edge case that leads foreground polling to continue in the background, but if background polling is not disabled this new bug should happen almost 100% of the time.

eli-darkly commented 1 year ago

I'm not sure what you mean by "orderly shutdown hooks". I used the phrase orderly shutdown in this context: "an orderly shutdown where the app has a chance to call LDClient.close()". There is no "hook" involved in the sense of something happening automatically— what happens is that close() calls various other methods that eventually lead to calling the method I linked to, PollingUpdater.stop(). Again, if LDClient.close() was called. So I'm not sure we are disagreeing on anything, but the code path I'm referring to does exist and there is a well-defined way for an application to explicitly shut down the SDK.

It could be that the behavior you're talking about does happen "almost 100% of the time", but I'm somewhat skeptical that no other customer would have noticed it before if so.

eli-darkly commented 1 year ago

In any case, I'm not sure how productive it is to debate that point since we have already established 1. we should fix it to cancel any previously-created AlarmManager alarms after an app restart if we continue to use AlarmManager, and 2. it may be that we should not continue to use AlarmManager. The general issue of "creating an AlarmManager alarm has the potential to leave it still firing if the app exited unexpectedly" is undeniable since by definition, if an app exits unexpectedly then it did not have the chance to do any kind of cleanup.

byencho commented 1 year ago

Sure, I was just hoping to shed some light on the idea that this issue may actually be bigger than the narrow case I originally posted and could potentially encompass the current behavior of "background" polling as well. So that is worth looking into for its own purposes. In turns of no one else reporting the issue, it's possible they either thought like I did that "background" polling was actually intended to wake the app up or simply didn't notice because its not something most teams need to check.

eli-darkly commented 1 year ago

We've released version 3.2.1 to address this. As described in the release notes, we're not dropping the use of AlarmManager yet in this patch, so it is still possible for an app to get woken up by an obsolete polling alarm after a crash; the current fix is that, in such a case, the SDK will immediately cancel that obsolete alarm. So, in the scenario you described where the SDK is configured offline at startup, it will not be setting a new alarm at that point and Android will not repeatedly wake up the app.

Please let us know if this patch works for you when you've had a chance to test with it.

byencho commented 1 year ago

Thanks for the update @eli-darkly! I will try to test this in the next day or so and let you know.

byencho commented 1 year ago

@eli-darkly I've done some testing and I can confirm it fixes the issue related to the foreground polling period being used in the background. Thanks for your help! I will go ahead and close this issue now.

eli-darkly commented 1 year ago

On the larger issue of whether we should even be using AlarmManager (or WorkManager) at all, I am still leaning toward thinking we should not, and preliminary work so far has seemed to support the idea that a simpler worker thread/executor mechanism would work fine for both foreground and background polling. It wasn't ever our intention for applications to be restarted for a poll after being closed; that's why the SDK explicitly cancels its alarms as long as it knows it is quitting, but of course there is no guarantee of an orderly shutdown. We just need to make sure that there aren't any unforeseen consequences of changing the implementation, so that's why we did only a smaller fix in this release.

byencho commented 1 year ago

Yeah that makes sense. I would definitely worry a little about the current functionality being something some consumers might be relying on (intentionally or otherwise). I know there was a time where we had background polling enabled specifically because we wanted the app to fetch updated flags before the user would typically open the app. That way we would have reasonably fresh flag values right on app startup even before the SDK's initial sync would complete. This was important because many users don't use the app very regularly and we need to check some flag values right on app startup.