mozilla-mobile / perf-frontend-issues

A repository to hold issues related to front-end mobile application performance.
4 stars 0 forks source link

A51 backfill.py results has sawtooth pattern between each installation of app #236

Open mcomella opened 2 years ago

mcomella commented 2 years ago

I added the A51 to our backfill.py system https://github.com/mozilla-mobile/perf-frontend-issues/issues/234 and am in the process of validating it https://github.com/mozilla-mobile/perf-frontend-issues/issues/235. When running the results, we unexpectedly see a sawtooth pattern: image

When running these tests, backfill.py will install a given build, run main then view, and then install the next day's nightly. It appears each installation of the app will toggle between the fast behavior and the slow behavior because, e.g. 6/12 will be fast for both main and view and then 6/13 will be slow for both main and view.

This isn't caused by bimodal behavior in the replicates of a single run. For example, here's 6/12 fast: image

And 6/13 slow: image

We should investigate to determine the root cause of this behavior.


To elaborate on the extent of the sawtooth behavior, I ran the first of the month 1/1 to 5/1 in one batch and then the 15th of the month 1/15 to 5/15 in one batch. The 1/1-5/1 exhibits a sawtooth and the 1/15 to 5/15 behavior exhibits an independent sawtooth, e.g. for main:

mcomella commented 2 years ago

After talking to fdotymoz, we're going to wait for the performance test team to implement the regression detection system in mozilla-central https://mozilla-hub.atlassian.net/browse/FXP-2163, see if they experience the same behavior, and then decide how it should be addressed.

mcomella commented 2 years ago

This could be an artifact of my specific device so a good first step might be to see if other folks can reproduce.

mcomella commented 1 year ago

I installed/uninstalled the 2022-11-08 Nightly four times and saw the same pattern on MAIN first frame:

==> sawtooth1.txt <==
{'max': 766.0,
 'mean': 728.0,
 'median': 720.0,
 'min': 701.0,
 'replicate_count': 8,
 'replicates': [765.0, 701.0, 719.0, 704.0, 717.0, 766.0, 721.0, 731.0],
 'stdev': 25.008569959687247}

==> sawtooth2.txt <==
{'max': 623.0,
 'mean': 552.0,
 'median': 540.0,
 'min': 532.0,
 'replicate_count': 8,
 'replicates': [623.0, 542.0, 553.0, 532.0, 559.0, 535.0, 538.0, 534.0],
 'stdev': 30.237157840738178}

==> sawtooth3.txt <==
{'max': 785.0,
 'mean': 729.5,
 'median': 723.0,
 'min': 708.0,
 'replicate_count': 8,
 'replicates': [785.0, 723.0, 727.0, 720.0, 721.0, 723.0, 708.0, 729.0],
 'stdev': 23.287028884890283}

==> sawtooth4.txt <==
{'max': 638.0,
 'mean': 566.625,
 'median': 558.5,
 'min': 547.0,
 'replicate_count': 8,
 'replicates': [638.0, 558.0, 553.0, 564.0, 554.0, 559.0, 560.0, 547.0],
 'stdev': 29.29620892099961}

I got two profiles:

In my brief look, it just seems like the Java code executes faster but there isn't a specific part of the code that's slower. We've previously "the java code just executes faster" on the G5 before but usually it doesn't happen on the same build.

I wonder if this is a PGO-type issue: e.g. can we reproduce this on builds from the play store (which would have cloud profiles)?

mcomella commented 1 year ago

I have additional debugging info:

It seems this issue is specific to Firefox Nightly.

mcomella commented 1 year ago

jonalmeida ran my steps from https://github.com/mozilla-mobile/perf-frontend-issues/issues/236#issuecomment-1307953003 and was unable to reproduce the issue:

==> install7.txt <==
{'max': 785.0,
 'mean': 730.6,
 'median': 727.5,
 'min': 676.0,
 'replicate_count': 10,
 'replicates': [767.0, 759.0, 700.0, 676.0, 705.0, 752.0, 733.0, 707.0, 722.0,
                785.0]}

==> install8.txt <==
{'max': 659.0,
 'mean': 623.0,
 'median': 623.5,
 'min': 595.0,
 'replicate_count': 10,
 'replicates': [658.0, 637.0, 624.0, 659.0, 623.0, 624.0, 595.0, 602.0, 604.0,
                604.0]}

==> install9.txt <==
{'max': 624.0,
 'mean': 606.1,
 'median': 608.5,
 'min': 587.0,
 'replicate_count': 10,
 'replicates': [624.0, 618.0, 587.0, 608.0, 609.0, 602.0, 592.0, 603.0, 609.0,

Summary of the issue so far

I think the next step is to use perfetto on my device to get a more complex system trace to see if we can figure out what might be happening. That being said, if this only affects my device, maybe it's not worth pursuing.

Another thing we can observe from jonalmeida's numbers compared to mine: jonalmeida's numbers are 609, 624, 728. Removing the high outliers, my numbers are 540 & 559: it seems like these devices aren't consistent when compared against each other which makes it more difficult for devs to collaborate (i.e. there's more nuance involved). Also, jonalmeida's numbers decrease with each subsequent run – is this a coincidence or true behavior? If it's true behavior, there's probably some caching going on which makes it difficult to test performance even on jonalmeida's device (like how it's difficult for me to test on mine but in a different way). In general, I'm a little concerned these A51 devices are too inconsistent to be good for local performance development. It might be helpful to get additional data points to better understand the issue.

mcomella commented 1 year ago

The sawtooth issue only affects my device: jonalmeida's does not see it.

Jesup had an interesting potential hypothesis on Element:

Perhaps this has to do with flash/wear leveling/erase/etc; that is the sort of thing that could conceivably lead to sawtooths. Each install writes NNN bytes; this will eventually require erasing a block or blocks. this is a guess, note! But it might link to installs

mcomella commented 1 year ago

I factory reset my A51 and can still reproduce the issue.

mcomella commented 1 year ago

fwiw, my A51's asset tag number is 51286 just in case someone else at Mozilla gets this device 😱