Identify possible sources of performance cliffs in WARM VIEW - Githubissues

mozilla-mobile / fenix

⚠️ Fenix (Firefox for Android) moved to a new repository. It is now developed and maintained as part of: https://github.com/mozilla-mobile/firefox-android

https://github.com/mozilla-mobile/firefox-android

Mozilla Public License 2.0

6.47k stars 1.27k forks source link

Identify possible sources of performance cliffs in WARM VIEW #17555

Closed mcomella closed 3 years ago

mcomella commented 3 years ago

We should do local experiments to try to figure out when WARM VIEW may result in a performance cliff so that we can correlate that with the telemetry data. Here are a few cases we thought of that we should check:

Add-ons
Painfully slow network (stalling on a network request / library callback that depends on this)
Device busy doing other things (other apps)
- Maybe this is an obvious source of slowdown we shouldn't check?
Firefox Accounts
Locale, locale switching
Old profiles, larger profiles
Number of open tabs
Dispatchers overloaded – lower core devices affected more
Whatever eles you can think of

We may need to modify the code to make accessing the start up time trivial: e.g. output a log with the time each time start up completes. It may help to come up with a quick script to run warm start ups too so we can get more than a single run.

┆Issue is synchronized with this Jira Task

mcomella commented 3 years ago

Results

desc	mean	median	max
normal	292.33	283	379
add-ons	364.27	356	525
busy (500% CPU)	555.53	494	1273
busy (800% CPU)	621.33	545	1726
2x busy 800%
FxA (blocked on https://github.com/mozilla-mobile/fenix/issues/17575)
lower power?

I had to test this on an emulator, rather than my G5, so it has separate results:	desc	mean	median	max
slow network	189	171	343

Notes:

Times are time until first Android frame is drawn (via adb shell am start -W's "TotalTime", which is 100-200ms more than if we logcatted the onDraw call)
15 replicants
I did not stop the process between replicants
Each replicant generally gets faster, despite the process being alive (the code is probably JITed)
I wonder how combining some of these attributes (e.g. busy + add-ons) may exacerbate the results

mcomella commented 3 years ago

My script for running these tests is:

echo "" > output.txt; for i in `seq 1 15`; do
   adb shell am start -W -t 'text/html' -d 'https://mozilla-mobile.github.io/perf-tools/mozperftest-test-page.html' -a android.intent.action.VIEW org.mozilla.fenix/org.mozilla.fenix.IntentReceiverActivity | grep "TotalTime" | cut -d ' ' -f 2 | tee -a output.txt
   sleep 2
   adb shell input keyevent KEYCODE_BACK
   sleep 1
done

mcomella commented 3 years ago

I accidentally updated my latest nightly so I started new results:	desc	mean	median	max
normal	340.13	322.0	513.0
open tabs (100+ of example.com)	407.93	386.0	607.0
open tabs (alexa top 50-ish)	366.33	347.0	536.0
FxA (signed in maybe 15s before test so perhaps syncing; tiny profile)	344.8	350.0	458.0
busy (2x 800% CPU)	1254.8	470.0	9541.0
busy (2x 800% CPU) again, didn't restart process before running	480.8	466.0	656.0

Notes:

15 replicates
Clear data, load a page, hit back until background to run test
For > 100 tabs, the exact amount is unclear: it displays "∞". I attempted to create 500 but my method drops some; I'd guess it likely closer to 100-200
For busy, the first 3 replicates were really significant (9541, 1678, 1776) but the remaining where less so. Replicates (in run order): [9541.0, 1678.0, 1776.0, 470.0, 857.0, 672.0, 452.0, 536.0, 404.0, 456.0, 558.0, 373.0, 359.0, 356.0, 334.0]

Given how much longer the first loads are compared to subsequent ones (in particular open tabs and busy), if we're really looking for performance cliffs, perhaps we should be stopping and starting the process before measuring the next replicate.

It's curious the normal start time increasing from last time...

mcomella commented 3 years ago

I took a profile of WARM LINK (the page might be cached) when the device is under the 800% CPU background app load: https://share.firefox.dev/2NAGtc2

However, I noticed the flame graph and stack chart do not line up so I'm not sure how trustworthy it is (the former seems to be calculated from sample count while the latter is from runtime).

mcomella commented 3 years ago

When the device is under load, I'd expect the UI thread (in addition to the gecko thread) to be throttled for heat concerns – as such, I'm not sure that there'd be anything we can do about it. Perhaps we should bucket our start up time telemetry into "device under load" and "device not under load" so we can do separate analyses.

Then again, in practice, how often are Android devices under heavy load when they're starting apps?

mcomella commented 3 years ago

I filtered the Google Play Console slow warm start by device. For the G5, 18.3% are considered "slow warm starts" (> 2s). In my local testing (on an empty device), a first replicate seemed to be ~400-600ms (to be fair, I took a very limited number of samples) so we can probably trust that our telemetry data is pointing to a real problem.

Google's definition of WARM also includes having to restart the process if the system saved some state in a bundle though. Unclear how many of these cases that represents

mcomella commented 3 years ago

I looked into GPlay console and Firebase Perf Monitoring to understand if they can help us identify perf cliffs. I wrote up a brief analysis: https://docs.google.com/document/d/1FWjM5gQgAlgm8d7m28gau7lHi-sKTDBhnAdvnwmpMr0/edit#

mcomella commented 3 years ago

I can't think of anything else super actionable to do here without data analysis to confirm that this is a real problem we're seeing.

mcomella commented 3 years ago

Let's repurpose this for now: with ecsmyth changing projects, this bug will be to own finding the performance cliffs in warm startup in general. Potentially consider filing a follow-up bug for this.

mcomella commented 3 years ago

In our brief analysis, we didn't find indication that there were perf cliffs or perf issues in WARM VIEW. As such, we're decided to focus on improving start up generally and adding simple telemetry, which might point us in specific directions.

Closing as there's nothing else to do here.