mozilla-mobile / fenix

⚠️ Fenix (Firefox for Android) moved to a new repository. It is now developed and maintained as part of: https://github.com/mozilla-mobile/firefox-android
https://github.com/mozilla-mobile/firefox-android
Mozilla Public License 2.0
6.47k stars 1.28k forks source link

Verify that the Leanplum Device Identifier was properly migrated or generated #15392

Closed st3fan closed 3 years ago

st3fan commented 3 years ago

I am debugging a case where Leanplum is:

But what fails is opening deeplinks, specifically the following two:

Note that these work properly for me (@st3fan) but not for @cbonacuse

In the Leanplum console I see that both our test devices are up to date (Firefox 81.1.1) and both have a push token assigned.

Since this works well for me, end-to-end, that gives me confidence that the LeanplumMetricsService and fenix:// intent handling works as expected. However, we did make two changes relatively recently related to the uid={{User ID}} part of those URLs that I am trying to rule out:

It is important to rule these out because if push notification messages appear but the links are rejected that will most likely mean that the check for the device identifier fails.

This issue is to re-read / re-review those two patches and think about edge cases that could cause the caching or the check to fail. Keeping in mind the following scenarios:

cc to @boek and @pocmo

┆Issue is synchronized with this Jira Task

st3fan commented 3 years ago

Looking at the device identifier caching code:

    internal val deviceId by lazy {
        var deviceId = preferences.getString(DEVICE_ID_KEY, null)

        if (deviceId == null) {
            deviceId = deviceIdGenerator.invoke()
            preferences.edit().putString(DEVICE_ID_KEY, deviceId).apply()
        }

        deviceId
    }

I think there is very little that can go wrong here. Before this change, the DEVICE_ID_KEY preference did not exist. So the first thing that happens when a user installs this update is that we generate a new device id that then gets stored in the preferences.

It is looked up in start() where a simple call to Leanplum.setDeviceId(deviceId) is made, which I think is guaranteed to return the properly generated/cached device identifier.

st3fan commented 3 years ago

I saw a few uplift requests for the caching patch, but I confirmed that it is present on the releases/81.1.1 branch.

st3fan commented 3 years ago

Regarding migration from Fennec to Fenix - I don't think it matters what happens there because the deviceId computed property will always generate a new ID the very first time it runs. And that ID will always be passed to Leanplum.setDeviceId() - unless LP does some kind of internal caching.

st3fan commented 3 years ago

I have a log from the failing device now which shows:

DeepLinkIntentProcessor: Invalid deep link: fenix://open?url=https%3A%2F%2Fsupport.mozilla.org%2Fen- \
    US%2Fkb%2Fwhats-new-firefox-android&uid=872c42ec-0000-0000-0000-32b58f36153d

Looking at the code, this only happens if the User ID that Leanplum puts in the URL does not match the deviceID property, which takes the value from the preferences.

Why do those not match?

st3fan commented 3 years ago

@jonalmeida @boek I see that this code changed from:

private val preferences = application.getSharedPreferences(PREFERENCE_NAME, MODE_PRIVATE)

to

private val preferences = StrictMode.allowThreadDiskReads().resetPoliciesAfter {
    application.getSharedPreferences(PREFERENCE_NAME, MODE_PRIVATE)
}

Could that have any effect on how these preferences work?

st3fan commented 3 years ago

Why is one called User ID and the other deviceID - are those the same .. or?

st3fan commented 3 years ago

Reading the Leanplum SDK code - I think the deviceID we generate is not always used by leanplum as the ID that we think it is. I don't really understand this code or how Leanplum works well enough, but it is clear that Device ID and User ID are two different concepts.

We've made assumption that setting the Device ID results in a User ID with the same value. But looking at this code, I can't help to think that it is more complicated than that. There are multiple paths for the logic that determines that. On top of that, in the deeplinks we ask the Leanplum Backend to substitute {{User ID}} - so that is definitely confusing: why do we set the Device ID but then target the User ID.

jonalmeida commented 3 years ago

@jonalmeida @boek I see that this code changed from:

private val preferences = application.getSharedPreferences(PREFERENCE_NAME, MODE_PRIVATE)

to

private val preferences = StrictMode.allowThreadDiskReads().resetPoliciesAfter {
    application.getSharedPreferences(PREFERENCE_NAME, MODE_PRIVATE)
}

Could that have any effect on how these preferences work?

No, this wrapping block only tells StrictMode to not report this as a disk read violation on the main UI thread.

st3fan commented 3 years ago

Reading the Leanplum SDK code - I think the deviceID we generate is not always used by leanplum as the ID that we think it is. I don't really understand this code or how Leanplum works well enough, but it is clear that Device ID and User ID are two different concepts.

We've made assumption that setting the Device ID results in a User ID with the same value. But looking at this code, I can't help to think that it is more complicated than that. There are multiple paths for the logic that determines that. On top of that, in the deeplinks we ask the Leanplum Backend to substitute {{User ID}} - so that is definitely confusing: why do we set the Device ID but then target the User ID.

I just sent myself a deeplink that looks like this:

fenix://settings?uid={{Device ID}}

And that worked. I've asked @cbonacuse to try to same to see if that makes a difference.

st3fan commented 3 years ago

And that worked. I've asked @cbonacuse to try to same to see if that makes a difference.

No change. Same results: notifications appear but links do not work.

pocmo commented 3 years ago

so that is definitely confusing: why do we set the Device ID but then target the User ID.

I thought this was confusing too, when adding the deep link check..., but this was a 1:1 port from Fennec:

pocmo commented 3 years ago

Regardless of the outcome of this issue, we may want to collect telemetry on deeplink rejections?

st3fan commented 3 years ago

One thing we are pretty certain of: this happens on migrated installs. I reproduced this state by installing Fennec 68.11, then upgrading to Fenix 81.1.1 from the Play Store.

st3fan commented 3 years ago

I uplifted #15584 to Fenix 81.1.2 and I rooted my Pixel 2XL to get easier access to the preferences of Firefox.

The following was tested on a device where I first installed Fennec 68.11 and then upgraded/migrated to Fenix 81.1.2.

I registered my device in the Leanplum console as a test device, using the Push Token that is printed during app startup. According to Leanplum my device is 4318eae1-829d-4066-8b1d-0a20372d3823. Let's consider this the source of truth because this is the deviceId that the SDK uses to identify me and what it uses to successfully communicate.

Here we go, at app startup, the following is now printed:

Starting Leanplum with device id: dba5cb2c-96e7-4fcf-87cf-0c1dbc5c1a2d
Started Leanplum with deviceId 4318eae1-829d-4066-8b1d-0a20372d3823 and userId 4318eae1-829d-4066-8b1d-0a20372d3823

And there is the problem. We initialize Leanplum with dba5cb2c-96e7-4fcf-87cf-0c1dbc5c1a2d - we pass that into Leanplum.setDeviceId() but when Leanplum has started, it has ignored the device Id that we asked it to use and used a new Id. (Spoiler: At the end you can read that this id not actually new but instead migrated from Fennec).

Opening a test notification prints:

Rejecting Leanplum deep link because uid 4318eae1-829d-4066-8b1d-0a20372d3823 does not
   match dba5cb2c-96e7-4fcf-87cf-0c1dbc5c1a2d

Now, this code works as expected on new installs. So we must hit some weird corner case where Leanplum is picking up the old config files from Fennec and ignoring the deviceId that we tell it to use.

Now that I can root shell into my device, it is easy to pick up /data/data/org.mozilla.firefox/shared_prefs/org.mozilla.gecko.BrowserApp.xml which has the following:

<?xml version='1.0' encoding='utf-8' standalone='yes' ?>
<map>
    <boolean name="android.not_a_preference.klar.package.installed" value="false" />
    <string name="android.not_a_preference.leanplum.device_id">4318eae1-829d-4066-8b1d-0a20372d3823</string>
    <boolean name="android.not_a_preference.focus.package.installed" value="false" />
</map>

We inherited the deviceId that was previously used in Fennec. And it did not pick up the new identifier that Fenix generated.

So what went wrong here?

I thouight we hit some kind of bug in the Leanplum SDK where it does not properly pick up a custom deviceId .. because:

Their documentation at https://docs.leanplum.com/reference#device-ids clearly says:

You can choose how the device ID will be set the first time start is called on that device by calling one of these before start: Leanplum.setDeviceId("customAndUniqueId") Sets the device ID to a custom ID. Make sure that your custom ID is unique per device.

But then it also says this:

The deviceId is set when Leanplum.start runs for the first time on that device. After this, it cannot be changed unless the user completely uninstalls and reinstalls your app.

And I think that is exactly what we are running into now. You cannot change the deviceId once it has been set, and on migrated builds the Leanplum SDK picks up the deviceId that was previously configured in Fennec, and will not allow us to override that.

A fix should be simple: If org.mozilla.gecko.BrowserApp.xml exists then pick up the android.not_a_preference.leanplum.device_id value and use that.

pocmo commented 3 years ago

And I think that is exactly what we are running into now. You cannot change the deviceId once it has been set, and on migrated builds the Leanplum SDK picks up the deviceId that was previously configured in Fennec, and will not allow us to override that.

A fix should be simple: If org.mozilla.gecko.BrowserApp.xml exists then pick up the android.not_a_preference.leanplum.device_id value and use that.

Ouch. Some time ago we tried to fix an issue in Fenix where we were generating a new ID on every startup. The good news is that Leanplum may have never used the new generated IDs. The bad news may be that our fix now generates and saves one that may not actually be used.

Two questions:

st3fan commented 3 years ago

Why do we generate an ID in the first place? Why do we not use whatever Leanplum generates?

Because we don't like the Leanplum way of generating persistent identifiers. They do two things: their preferred method is to use the device MAC address. If that fails they fall back to Secure.ANDROID_ID. Both are persistent - your MAC doesn't change (unless modern Android randomizes it, but I don't think it does) and Secure.ANDROID_ID is based on app signature + google user id + device properties.

So both are persistent, which makes sense for a tool like Leanplum, except that we prefer to use something that randomizes after each install.

liuche commented 3 years ago

@st3fan is this something that we can QA now?

stale[bot] commented 3 years ago

See: https://github.com/mozilla-mobile/fenix/issues/17373 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.