launchdarkly / android-client-sdk

LaunchDarkly Client-side SDK for Android
Other
45 stars 23 forks source link

Incorrect flag value returned after updating context #276

Closed hoda0013 closed 1 month ago

hoda0013 commented 1 month ago

Describe the bug

TL;DR I start my app and am initializing LDClient with a default, anonymous context. I then update that context and set the context's key to the user's unique ID and then request a feature flag and I get the expected flag value returned. Then I then kill my app, put the device in airplane mode and start the app again, I go through the exact same process where I initialize the LDClient with a default, anonymous context, then update the context with the same user id and then request a feature flag, but this time the flag value is not correct and seems to be returning the values I'd expect if I was still using the the default, anonymous context.

A more in depth explanation of how the app is setup:

Whenever our app starts up, in Application.onCreate() we immediately call LDClient.init(...) with a default, anonymous LDContext. This is because we want to always initialize the LDClient as soon as the app's process is started and we don't yet know if the app has a logged in user (we'd have to make an async db call to determine this).

Whenever we attempt to get a LD flag value, we check if the user is logged in. If they are and we haven't updated the LD context, we call LDClient.identify(newContext) and block until it returns. Then we use that context to get the flag value. This seems to generally work fine, but I'm running into an issue and either I'm misunderstanding how this should work or I've found a bug.

Here's the scenario:

My understanding at this point is that has cached that context and pulled down the correct flag values for that context and persisted them.

Now, we continue the scenario:

Initializing the LDClient in the application class:

    fun getDefaultLDContext(versionName: String): LDContext = LDContext.builder(
        ContextKind.DEFAULT,
        "anonymous"
    )
        .anonymous(true)
        .set(
            "version",
            versionName
        ) // Setting this because if you don't set any custom attributes and then try to update the context with custom attributes you get an NPE
        .build()  

    fun getLDConfig(): LDConfig = LDConfig.Builder()
        .mobileKey(PRODUCTION_MOBILE_KEY)
        .secondaryMobileKeys(
            mapOf(
                DEVELOPMENT_MOBILE_KEY_NAME to DEVELOPMENT_MOBILE_KEY,
                STAGING_MOBILE_KEY_NAME to STAGING_MOBILE_KEY
            )
        )
        .events(
            Components.sendEvents()
                .allAttributesPrivate(true)
        )
        .logLevel(LDLogLevel.DEBUG)
        .disableBackgroundUpdating(false)
        .build()

    override fun onCreate() {
        /**
         *  Initialize LD immediately before any dependency injection occurs and
         *  without blocking the thread, the client connects in the background and periodically
         *  checks for flag value updates
         */
        println("application onCreate()")

        if (BuildConfig.DEBUG) {
            Timber.plant(Timber.DebugTree())
        }
        val defaultLDContext = LaunchDarklyHelper.getDefaultLDContext(BuildConfig.VERSION_NAME)
        LDClient.init(
            this,
            LaunchDarklyHelper.getLDConfig(),
            defaultLDContext,
            0
        )
        println("LDClient init finished")
        super.onCreate()
}

Updating the context before requesting a flag. Using suspendCoroutine so that I can call .get() on the Future and not block the calling thread. Once the context has been updated, then we resume the coroutine.

    private suspend fun build(session: Session): ClientEnvironment {
        return mutex.withLock {
            val appEnv = appEnvironmentService.getAppEnvironment()
            println("FF appEnv: $appEnv")
            val ldClient =
                launchDarklyHelper.getClientForEnvironment(appEnv)
            println("FF ldClient: $ldClient")
            if (this.session != session) {
                println("FF session ${this.session }!= session $session")
                this.session = session
                suspendCoroutine {
                    println("FF updating ld context")
                    launchDarklyHelper.updateLDContext(session, ldClient)
                    println("FF finished updating LD Context")
                    it.resume(ClientEnvironment(ldClient, appEnv))
                }
            } else {
                ClientEnvironment(ldClient, appEnv)
            }
        }
    }

    fun updateLDContext(
        session: Session,
        ldClient: LDClient
    ) {
        val updatedContext =
            LDContext.builderFromContext(getDefaultLDContext(BuildConfig.VERSION_NAME))
                .anonymous(false)
                .key(session.userId)
                .set(
                    "role",
                    session.role.name
                )
                .set(
                    "email",
                    session.emailAddress
                )
                .set(
                    "company_id",
                    session.company.id
                )
                .set(
                    "reseller_id",
                    session.reseller.id
                )
                .build()

        ldClient.identify(updatedContext)
            .get()
    }

To reproduce

Expected behavior If I have accessed flags with my updated context, those flags should be cached and returned, even if the device is in airplane mode. If I do this exact same test and just don't put the device in airplane mode, everything works as expected.

Logs If applicable, add any log output related to your problem.

SDK version 4.3.1

Language version, developer tools Android Studio, Kotlin 1.8.10

OS/platform Android

Additional context Not sure if this matter, but, I'm seeing this happen when running an Android WorkManager job after the app has been killed. When the job triggers, first the application class starts, LDClient has init() called with the default, anonymous context, then the WorkManager job starts and updates the LDContext and requests a flag value.

tanderson-ld commented 1 month ago

Hi @hoda0013, and thank you for reaching out and reporting this issue you're seeing. Also thank you for a really extensive set of details, that is always helpful.

Your understanding that it should be caching the flag data in persistence is correct and that it should be able to serve those cached flags after you switch to the updatedContext.

Do you see either of these logs from the code below when you switch to the updatedContext? Source

    public void switchToContext(@NonNull LDContext context) {
        synchronized (lock) {
            if (context.equals(currentContext)) {
                return;
            }
            currentContext = context;
        }

        EnvironmentData storedData = getStoredData(currentContext);
        if (storedData == null) {
            logger.debug("No stored flag data is available for this context");
            // here we return to not alter current in memory flag state as
            // current flag state is better than empty flag state in most
            // customer use cases.
            return;
        }

        logger.debug("Using stored flag data for this context");
        initDataInternal(context, storedData, false);
    }

Additional information in case you debug a bit. The SDK uses the context's canonical key / fully qualified key as the unique identifier of the context for caching. You can call myContext.getFullyQualifiedKey to see it. This is of the format kind1:key1:kind2:key2 where this is a multicontext with two kinds (kind1 and kind2) where each has a respective key. In your situation you have a single context with kind user (the default kind) and key anonymous (which you specified the literal string anonymous) so the canonical key will be user:anonymous. Then you switch to the key being the session.userId. Perhaps there is an issue in there and the canonical key is not actually changing and so the SDK loads the same flag data again.

hoda0013 commented 1 month ago

@tanderson-ld Thanks for the info!

The only time I see the No stored flag data... log is on the first install of my app, after that I'm always seeing that it has stored flag data.

I started logging out the fully qualified key. What I see is that when I'm using the default, anonymous context myContext.getFullyQualifiedKey returns anonymous

After I update the context (but before I call ldClient.identify()) myContext.getFullyQualifiedKey returns MY_USER_ID, where MY_USER_ID is the id of the user I set as the context key.

I ran through the test scenarios again and set the Launch Darkly log level to Debug.

In the following log here's what actions I took to set the test up:

When the WorkManager job runs I do see the Using stored flag data.. in the logs during Application onCreate():

 I  FF application onCreate()                                                                                                                                                      
 I  Initializing Client and waiting up to 0 for initialization to complete                                                                                                         
 I  Creating LaunchDarkly client. Version: 4.3.1                                                                                                                                   
 D  Using cache at: /data/data/com.REDACTED.debug/cache/com.launchdarkly.http-cache                                                                                  
 I  Creating LaunchDarkly client. Version: 4.3.1                                                                                                                                   
 D  Using cache at: /data/data/com.REDACTED.debug/cache/com.launchdarkly.http-cache                                                                                  
 I  Creating LaunchDarkly client. Version: 4.3.1                                                                                                                                   
 D  Using cache at: /data/data/com.REDACTED.debug/cache/com.launchdarkly.http-cache                                                                                  
 D  Using stored flag data for this context                                                                                                                                        
 D  Stored context index is now: [["0Zy1RD3xqXV0H-Ck_Fv364emxZkMvJp8Dk1Z8uon0Tk=",1722613475917],["Lxg6TmRJOvPzd_dF7aUCNjzT5-9uTSZtREdY3gqF_Mg=",1722613486703]]                   
 D  Using stored flag data for this context                                                                                                                                        
 D  Stored context index is now: [["0Zy1RD3xqXV0H-Ck_Fv364emxZkMvJp8Dk1Z8uon0Tk=",1722613475981],["Lxg6TmRJOvPzd_dF7aUCNjzT5-9uTSZtREdY3gqF_Mg=",1722613486725]]                   
 D  Using stored flag data for this context                                                                                                                                        
 D  Stored context index is now: [["0Zy1RD3xqXV0H-Ck_Fv364emxZkMvJp8Dk1Z8uon0Tk=",1722613475976],["Lxg6TmRJOvPzd_dF7aUCNjzT5-9uTSZtREdY3gqF_Mg=",1722613486743]]                   
 I  FF LDClient init finished                                                                                                                                                      

About 5 seconds later I update the LD Context with the logged in user's information then request the flag value from LD.

When I request the flags I see the following LD logs:

returning variation: {false,1,null} flagKey: batch-close-local-total-report context key: MY_USER_ID

returning variation: {false,1,null} flagKey: batch-close-dont-check-if-empty context key: MY_USER_ID

returning variation: {false,1,null} flagKey: single-threaded-processing context key: MY_USER_ID

However, batch-close-dont-check-if-empty should be true not false.

Is this helpful? Are there other logs you'd like to see that would help diagnose this more easily?

Also, In the LDConfig there's the offline(boolean offline) function. I'm never interacting with that function so it should just be using the default value.

One other thing, when I call ldClient.identify().get() I never see any LD logs. I just see the log I print right before and right after that call. Should I be seeing logs when I call indentify() like logs saying that the stored context is being updated, etc...?

Thanks!

tanderson-ld commented 1 month ago

You should see either of those two logs I mentioned when you call ldClient.identify(updatedContext).get().

Do you expect to be initializing 3 clients? I see Creating LaunchDarkly client. Version: 4.3.1 three times and you use getClientForEnvironment. Perhaps you have 3 environments? Just making sure I understand the current configuration.

Can you verify the context that you are passing in the identify call is valid? updatedContext.isValid() can be used.

I will spend some time in the next day or two to try to reproduce the issue on my side and let you know what I find.

hoda0013 commented 1 month ago

@tanderson-ld

I'm seeing those two logs when I call LDClient.init() but am not seeing them when calling LDCLient.identify(), even when the flags are working as expected.

In our LDConfig.Builder() we are setting a mobileKey() and two secondaryMobileKeys(). These correspond to production, development and staging environments. Is that why we'd be seeing the Creating LaunchDarkly client... message 3 times?

I will run another test to check if the context is valid and report back.

Thanks!

hoda0013 commented 1 month ago

@tanderson-ld I ran the test scenarios again and logged out isValid() for the anonymous and updated contexts and both returned true in all of the test scenarios I described.

hoda0013 commented 1 month ago

@tanderson-ld One more detail I should add. This is occurring on a Pax payment terminal Android device. We don't run this code on regular Android devices, only on Pax payment terminals so I have only observed this on a Pax device (but only because I have only run this code on a Pax device, it very well could be occurring on regular Android devices as well).

tanderson-ld commented 1 month ago

@hoda0013, I apologize for not realizing this, but this issue is already fixed in our latest version and it was specifically fixed in 5.1.1. For some reason I was under the impression you were using the latest SDK version when I read through the issue originally.

Are you able to update to the latest SDK version? Here is the 4.X to 5.X migration guide.. Since you are already using LDContext, I think it will be pretty straightforward for you.

hoda0013 commented 1 month ago

@tanderson-ld Thanks for the feedback. I'll take a look at the migration guide. I'll close this issue now since it was determined this has already been fixed.

Thanks!