mozilla-mobile / perf-frontend-issues

A repository to hold issues related to front-end mobile application performance.
4 stars 0 forks source link

Validate mach perftest VIEW against FNPRMS #141

Closed mcomella closed 3 years ago

mcomella commented 4 years ago

acreskeyMoz has validated that mach perf-test perf is roughly comparable to FNPRMS perf as we currently run it. We should additionally ensure they're comparable from a FE perspective.

In order to do this, we'll need to:

Then the results should be comparable. If not, investigate why not (with acreskey)!


_edit: For a summary of the issues we investigated in this bug, see the mid-way Summary).

mcomella commented 4 years ago

edit: I updated the numbers and strikethrough'd an invalid theory. I had made a manual adjustment to FNPRMS measurement I had forgot about and needed to revert.

mach perftest in FNPRMS parser:

FNPRMS:

My current theory is that mach perftest is delayed because we're waiting for marionette to attach to continue the test. FNPRMS with set-debug-app similarly attaches marionette but, since it can't connect to the remote server, I suspect it waits longer and times out.

N.B. the current Nightly logs in a different way so FNPRMS parsing is broken: "3" needs to be changed to "2" on these lines: https://github.com/mozilla-mobile/FNPRMS/blob/3389f021f7e0cc91b9205a9972cebc507e32398f/times.py#L175-L181 Furthermore, there are more changes to FNPRMS logging (e.g. app IDs) that I don't think have made it upstream yet that I have changed locally.


I additionally tried a few things and saw no significant impact:

mcomella commented 4 years ago

edit: Strikethrough b/c I used the wrong numbers, see above.

There is also this log from perftest:

1598906053729 mozdevice DEBUG execute_host_command: >> "shell:am start -W -n org.mozilla.fenix/org.mozilla.fenix.IntentReceiverActivity -t text/html -a android.intent.action.VIEW -d https://example.com --es args -marionette\ -profile\ /mnt/sdcard/org.mozilla.fenix-geckodriver-profile"

1598906054307 mozdevice DEBUG execute_host_command: << "Starting: Intent { act=android.intent.action.VIEW dat=https://example.com/... typ=text/html cmp=org.mozilla.fenix/.IntentReceiverActivity (has extras) }\nStatus: ok\nLaunchState: COLD\nActivity: org.mozilla.fenix/.HomeActivity\nTotalTime: 493\nWaitTime: 494\nComplete\n"

The discrepancy between FNPRMS and perftest is around 310ms – I wonder if this log about a WaitTime of 493ms (units?) is related.

mcomella commented 4 years ago

I did a comparison over 15 runs on my Pixel 2 with Nightly 200901 using mach perftest w/ the added delays and conditioned profiles through FNPRMS' log parser vs. FNPRMS with conditioned profiles:

The difference is within the noise, especially since this loads a page and FNPRMS restores its own session instead of what's in the profile, so I feel like FNPRMS and mach perftest are roughly equivalent for VIEW with the added changes. However, we still have additional investigation to do to replace FNPRMS:

mcomella commented 4 years ago

I got numbers for the GS5 (limited to 4 runs b/c mach perftest overflows the logcat output):

These would probably be close enough with enough runs (I expect this device to be noiser than the P2 too).

acreskeyMoz commented 4 years ago
  * What delay should we actually add to mach perftest? Balance noise reduction + runtime

To determine the optimal number, I would push a few options to try. i.e. ./mach try fuzzy --full and then select the VIEW tests. We can then compare the results and see the impact.

  * How many iterations does mach perftest run in CI? How many should it run? Balance noise reduction + runtime

Right now it's 14 per test. https://searchfox.org/mozilla-central/rev/84922363f4014eae684aabc4f1d06380066494c5/taskcluster/ci/perftest/android.yml#61 We chose this some time ago as the results seemed to stabilize. (Sorry, I don't have the data handy). Because the additional iterations don't take much time compared to test setup, I think we should increase this if we can demonstrate that it produces more stable results. Already, the results are sufficiently stable to sheriff.

* More complex modifications to mach perftest

  * Why are conditioned profiles slower in mach perftest than no conditioned profiles?

That is a very good question, and I think it's likely to be something on the platform side. I'd like to take this on as a task.

  * What should we do about performance tuning?

This bug tracks what we saw with our current performance tuning: https://bugzilla.mozilla.org/show_bug.cgi?id=1649511 Greg did a nice job analyzing the noise. We can make changes to the perf tuning specifications and push them to try. But the fact that the current G5 tuning helps pageload might make this problematic to optimize for both cases.

  * The logcat is missing data on my GS5: it should take logcat more regularly if we want it all in the artifacts (currently, perftest results capture correctly but I can't send it to FNPRMS without the full logs)

I wonder if this related to the --android-clear-logcat perf test option? https://searchfox.org/mozilla-central/rev/84922363f4014eae684aabc4f1d06380066494c5/python/mozperftest/mozperftest/tests/test_android.py#235

* set-debug-app & friends

  * What's the performance impact of `set-debug-app`, assuming we don't add any code that changes behavior (which currently happens)?
  * What's the performance impact of the code that `set-debug-app` inspires? Can we do better? There's a 756ms difference between FNPRMS without the flag & with the flag + conditioned profiles

Yes, I'm very concerned about this one in particular. It might be trickier to investigate without a rooted device. Let me know if I can help.

* MAIN: goes directly to onboarding instead of the homescreen ([mozilla-mobile/fenix#13470](https://github.com/mozilla-mobile/fenix/issues/13470) ?)

Let me run this locally and I'll see what I can find.

mcomella commented 4 years ago

edit: reduced action items in https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-689174357

I regrouped the action items from https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-685204286 into more actionable focus areas and bolded the ones I think we should address in this issue, before replacing FNPRMS VIEW with mach perftest:

Accuracy of results

some can be done later, as long as we can still catch regressions; note: results not comparable to other apps & may see perf characteristics different from real devices

Fix MAIN

want to get VIEW working first...

Noise reduction

can be done later as long as current noise is tolerable; what's the current delta between runs?

Testing artifacts

not that important right not

edit: acreskeyMoz also mentioned:

mcomella commented 4 years ago
  * The logcat is missing data on my GS5: it should take logcat more regularly if we want it all in the artifacts (currently, perftest results capture correctly but I can't send it to FNPRMS without the full logs)

I wonder if this related to the --android-clear-logcat perf test option?

I'm assuming it's because the logcat logs are pulled at the end and my device has a small maximum logcat buffer – I had the same problem with FNPRMS on this device and had to rewrite the code to pull the logs between runs.

acreskeyMoz commented 4 years ago
* [acreskey] MAIN goes to onboarding ([mozilla-mobile/fenix#13470](https://github.com/mozilla-mobile/fenix/issues/13470) ?)

Locally, on Pixel 3 and Moto G5, I'm seeing ./mach perftest skip onboarding, and correctly measure MAIN results. e.g. [1437.0, 1381.0, 1345.0]

If I disable the performancetest intent arg, then I see it launch to onboarding. https://searchfox.org/mozilla-central/rev/2b250967a66886398e5e798371484fd018d88a22/testing/performance/hooks_android_main.py#16-17

So I think this looks like it's https://github.com/mozilla-mobile/fenix/issues/13470 Other developers (mattwoodrow, I believe) also had issues with the feature in local testing. We can verify by having someone who can reproduce the problem remove the conditions around the test.

mcomella commented 4 years ago

add small delays to get perftest close enough to FNPRMS

Referencing the P2, I don't think adding a delay is necessary with conditioned profiles, which seem to add a large delay between runs anyway that gives the device enough time to settle. This is fragile if we decide not to use conditioned profiles, however. Here are the numbers:

There's a lot of variance in these results (we are loading live pages) but with a reduced delay between tests I'd expect the results to slow down (as the device is heat throttled) but that's not what we're seeing so I feel the consistency of results I've seen locally and we saw the other day also validate these numbers.


On the GS5, I see something similar – I'd expect the results to get longer without the delay but they're roughly the same, vaguely in the noise:

An interesting pattern I saw is that each run on this device will increase in time:

PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 14487782.1, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [14368011, 14393805, 14419708, 14449037, 14476338, 14501471, 14528300, 14554244, 14580482, 14606425], "lowerIsBetter": true, "value": 14487782.1, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "firefox"}}

Second run:

PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 15276387.25, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [15237875, 15263327, 15289057, 15315290], "lowerIsBetter": true, "value": 15276387.25, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "firefox"}}

I added this to the action items above.

mcomella commented 3 years ago

perf impact of set-debug-app without code that leverages it?

I built a custom GV that returns false from the isApplicationCurrentDebugApp method that triggers reading the GV configuration YAML that modifies how GV runs, including the code that enables marionette. The FNPRMS VIEW results show that set-debug-app has no impact on application performance:

This implies the performance impact is all on the custom code we run when set-debug-app is enabled for fenix.

edit: There may be a small impact for set-debug-app but it seems negligible and it's hard to measure in this test because it's a live page load (i.e. noisy). I re-ran the numbers after running clear-debug-app and got 1.41; I re-added debug-app and got 1.42.

mcomella commented 3 years ago

Our goal is to replace FNPRMS which is currently functioning as a regression detection system. It is not currently being used for, but was and will later be used for comparing performance against other applications including fennec and Chrome. In the current use case, relative values are all that matter. In the latter, inactive use case, absolute values matter.

With these goals in mind, let's look at the remaining action items https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-685849604 critically:

* [ ]  **[acreskey] conditioned profiles slower than no conditioned profiles**

I think we can do this later: conditioned profiles are used to reduce noise and unexpected changes between runs. For catching regressions, I do not think the default of having them enabled will make a significant difference vs. not using them.

* [ ]  **perf impact of our code leveraging set-debug-app? (756ms diff on P2)**

I think we can do this later: I do not suspect this code will add variations that make catching regressions harder. It just seems like an absolute negative diff. We may find false regressions if the performance of this code changes though.

* [ ]  **investigate if we transmit adb data throughout test? if so, there is a perf impact**

Ideally, we'd look into this: this may introduce variation between runs, e.g. depending on the log statements that are called or just because adb can be expensive.

That being said, due to the logging issue I experienced on my GS5, I suspect we're not doing this and we may be okay to put off this investigation.

* [ ]  **Each run in a test run on the GS5 gets longer** (fixed by perf tuning?)

Ideally, we'd look into this: if each run gets longer on the G5 in CI, we're not getting unbiased results between runs. That being said, if every push has the same behavior, it's possible this is negligible.

* [ ]  **Validate current noise is tolerable**

Briefly investigate: I've been told the noise is acceptable to sheriff which might be good enough for us.

So reduced action items:

mcomella commented 3 years ago

Each run in a test run on the GS5 gets longer (fixed by perf tuning?)

I looked at the performance tests for a recent Treeherder revision (G5 treeherder, G5 perf data 1, G5 perf data 2, P2 treeherder, P2 perf data 1, P2 perf data 2).

A look at the specific perf data shows the results are not increasing and not impacted by this problem.

mcomella commented 3 years ago

Validate current noise is tolerable

Using the results from the comment above, the results seem just as noisy as FNPRMS though we do fewer runs in mach perftest.

FNPRMS VIEW diff is 198ms in 10 runs (numbers from debug-app on with no local modifications to the build but it reproduces without debug-app too) though it's ~100ms max on the P2 between days (graph; I looked at mid-August numbers due to recent regression).

I don't think it's worth investigating further but perhaps we want to increase the iteration count to match FNPRMS (we're at 14 on perftest & 25 on FNPRMS).


Another interesting tidbit: on the P2, I see 1538ms locally but this test is 1319ms: I believe something must be configured differently. Possible causes:

acreskeyMoz commented 3 years ago

Another interesting tidbit: on the P2, I see 1538ms locally but this test is 1319ms: I believe something must be configured differently. Possible causes:

* perf-tuning is enabled in CI

* conditioned profiles are disabled in CI

perf-tuning is enabled on P2 in CI (but not G5 since it introduced noise) https://searchfox.org/mozilla-central/rev/b2716c233e9b4398fc5923cbe150e7f83c7c6c5b/taskcluster/ci/perftest/android.yml#90

Conditioned profiles are enabled for both devices: https://searchfox.org/mozilla-central/rev/b2716c233e9b4398fc5923cbe150e7f83c7c6c5b/taskcluster/ci/perftest/android.yml#96-98

mcomella commented 3 years ago

I don't think it's worth investigating further but perhaps we want to increase the iteration count to match FNPRMS (we're at 14 on perftest & 25 on FNPRMS).

Spoke to sparky about noise today: we're concerned about increasing iterations because the tests already take 30-40min to run. However, the theory is we keep iteration count low and run per-commit, we can see where regressions are introduced by looking at multiple runs, rather than getting each commit exactly right. We also have the ability to retrigger to get additional data making accuracy on every run less important.

I think there's nothing to do here to reduce noise until we actually start looking for regressions.

mcomella commented 3 years ago

investigate if we transmit adb data throughout test? if so, there may be a perf impact

Sparky mentioned we run adb logcat at the end; I can't think of other adb commands we'd be running continually (the ones I can think of are all commands that dump info) so I think we can stop investigating this for this MVP effort.

mcomella commented 3 years ago

Large variation in runtime between local and CI

perf-tuning is enabled on P2 in CI (but not G5 since it introduced noise) https://searchfox.org/mozilla-central/rev/b2716c233e9b4398fc5923cbe150e7f83c7c6c5b/taskcluster/ci/perftest/android.yml#90

I enabled perf-tuning locally and got times of 1624ms, compared to 1319ms on CI. Looking into it...

mcomella commented 3 years ago

I got 1521ms from running the latest nightly-simulation build; perhaps I will try to compare my args and builds against those run in CI to make sure I'm running in an identical situation. Then I suppose I can try comparing the logs.

acreskeyMoz commented 3 years ago

I got 1521ms from running the latest nightly-simulation build; perhaps I will try to compare my args and builds against those run in CI to make sure I'm running in an identical situation. Then I suppose I can try comparing the logs.

Is your local device rooted? If it's not the perf-tuning will be skipped by the test harness.

Another thing I haven't looked at is the variance from one device to device (e.g. one Pixel 2 to another).

Although in CI it's 14 iterations on one device from a pool, and we're not seeing a huge amount of noise from device to device.

acreskeyMoz commented 3 years ago
* [ ]  **[acreskey] conditioned profiles slower than no conditioned profiles**

With some work I made a geckodriver that automatically enables Fenix startup profiling.

The root cause of this discrepancies looks to be the scanning of the addons database for changes (which we only see in conditioned profiles).

I've logged this one and I'm following up with the addons folks: https://bugzilla.mozilla.org/show_bug.cgi?id=1664025

mcomella commented 3 years ago

Is your local device rooted?

No. I tried disabling perf tuning on CI instead but I get roughly the same result: 1324ms.

I tried to match the mach perftest arguments from CI but I still get large values 1553.8ms. The only one I haven't been able to match yet is --browsertime-geckodriver ${MOZ_FETCHES_DIR}/geckodriver because I don't know how and I intentionally ignored --android-install-apk fenix_nightlysim_multicommit_arm64_v8a because I have an APK already installed on my device.

mcomella commented 3 years ago

I tried downloading the same build as the per-commit run in the try push above but still got similar results.

I tried using the --browsertime-geckodriver arg by getting a recent version of geckodriver linked by sparky but got the same results. Here's my full arg list:

#!/usr/bin/env zsh

#./mach perftest \
python3 python/mozperftest/mozperftest/runner.py \
    --flavor mobile-browser \
    --android \
    --android-app-name org.mozilla.fenix \
    --perfherder-metrics processLaunchToNavStart \
    --android-activity org.mozilla.fenix.IntentReceiverActivity \
    --android-clear-logcat \
    --android-capture-logcat logcat \
    --android-perf-tuning \
    --hooks testing/performance/hooks_android_view.py \
    --perfherder \
    --perfherder-app fenix \
    --browsertime-iterations 10 \
    --browsertime-geckodriver /Users/mcomella/Downloads/geckodriver \
    --profile-conditioned \
    --profile-conditioned-scenario settled \
    --profile-conditioned-platform p2_aarch64-fenix.nightly \
    --output artifacts \
    testing/performance/perftest_android_view.js

The only leads I have left are:

mcomella commented 3 years ago

In addition to the leads above, sparky suggested:


I took two profiles to try to understand the root cause of why CI perftest takes longer than my perftest:

My thinking is that if I can identify where mach perftest is taking a long time compared to normal start ups, it might give me hints as to why my runtime is longer than mach perftest.

mcomella commented 3 years ago

What I got from the profiles so far:

Noise-related?:


acreskey noted:

FYI, whimboo just landed a patch that defers the loading of a whole bunch of JSM imports in marionette. https://bugzilla.mozilla.org/show_bug.cgi?id=1660881#c9 So it will be worth checking to see if we see this in the CI VIEW tests.

It could be that some part of the 155ms I mention above has gone away in the next nightlies.

mcomella commented 3 years ago

I wonder if the addons DB scanning is also related here (200-300ms delay on acreskeyMoz's P2): https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-689794541 I'm not sure if my normal start up was run with a conditioned profiles or not.

mcomella commented 3 years ago

I wonder if the addons DB scanning is also related here (200-300ms delay on acreskeyMoz's P2)

I see a 390ms diff in checkForChanges between the two profiles (curiously, acreskey mentions a 200-300ms delay). Combined with the marionette's 155ms, that's 545ms. I saw a 470ms diff from a normal start-up to the perftest start up w/ conditioned profiles: it's possible the cause are these two items. The bulk of marionette occurs after navigation start so nvm that last statement.

The problem I'm trying to understand is why my local runs take longer than CI. I wonder if it could be caused by either of these.

FYI, whimboo just landed a patch that defers the loading of a whole bunch of JSM imports in marionette. https://bugzilla.mozilla.org/show_bug.cgi?id=1660881#c9 So it will be worth checking to see if we see this in the CI VIEW tests.

Now that marionette may be fixed, maybe it's worth re-running and seeing if I still get such a large discrepancy. If it's fixed, I can probably say whether marionette caused it or not and if it's more likely to be the checkForChanges code (which the discrepancy between the length on my device and acreskey's points to that).

mcomella commented 3 years ago

(curiously, acreskey mentions a 200-300ms delay).

I think this actually came from my numbers of perftest conditioned vs non-conditioned, not acreskey's P2.

mcomella commented 3 years ago

Because it's Monday and I don't know where I am anymore...

Summary so far, targeting current problems:

Our goal is to replace FNPRMS for:

  1. (now) regression detection
  2. (later) to measure absolute performance against a baseline (Fennec, Chrome, etc.)

For replacing a regression detection system, we care about:

We've learned that the noise appears to be the same between FNPRMS and mach preftest for individual iterations (local measurements) https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-689203275 but has more noise for a aggregated run due to a reduced iteration count but we're choosing to do nothing at the present https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-689773683

Current problem: accurate representation of perf changes

We're still trying to ensure perf regressions or improvements are represented accurately. Here are some key measurements (local on P2 w/ 10 iterations on Nightly 200914 06:05, taken today 9/14, mc b21d31971a86, unless otherwise specified): Test FNPRMS logcat measure perftest measure
FNPRMS raw 1.2906s
FNPRMS cond prof hack 1.969s (14 days ago; today = crash)
mach perftest cond prof 2.0689s 1626.9ms
mach perftest non-cond prof 1.6363s 1315.3ms
[CI - fenix commit 05857ba55] mach perftest conditioned profiles 1.6466s (logcat) 1322.1ms; from this Treeherder job

From these numbers, we know:

And some concerns/questions:

mcomella commented 3 years ago

Broad approaches

So far, we've been trying to understand what mach perftest is doing differently from FNPRMS to theoretically verify it would accurately represent changes the the code. Instead of, or in addition to, this, we could:

Theories of cause of issues

Consider for later

mcomella commented 3 years ago

acreskeyMoz took numbers on the G5:

Running perftest view locally on my G5 (unrooted, not a personal device) ~3340ms (overall score for a run is ~3300ms to ~3400ms) CI G5 (rooted): ~3150ms (reproduced over multiple runs) https://treeherder.mozilla.org/perf.html#/graphs?series=try,2611385,1,13&selected=2611385,1219705978

This generally lines up with what we're seeing on the P2.

mcomella commented 3 years ago

acreskey verified network latency applied to the the host machine and the device doesn't seem to impact test time:

I can do right now quite easily: use my macbook as the hotspot for the device and throttle my Mac via Network Link Conditioner. acreskey So far I don't see much if any impact from pretty severe throttling (i.e. painfully slow to navigate web):

G5 to MacBook: 3321.85

G5 to MacBook @ 3G throttling (100ms delay up + 100ms down + minimal bandwidth ) 3381.857

I'll crank up the latency and see.

Was there some blocking network call in firefox startup? Now running at 500ms round trip latency and I'm seeing similar numbers. So that's good for reproducibility of environments, anyway..

Also, we're seeing different run time for different folks locally on the G5:

acreskey G5 (unrooted, not a personal device) ~3340 CI G5 (rooted) ~3150 mleclair G5 (unrooted, not a personal device) 3999, 3618, 3529.5 so that's a fresh install of the app, as per the command, but the numbers are shifting...

mcomella commented 3 years ago

I added some long term concerns I've dabbled on here to future of cold startup meeting agenda.


I previously validated noise on the P2. At MarcLeclair's suggestion, I just checked the noise on GS5/G5:

Seems like FNPRMS and perftest are roughly the same amount of noise, given the difference in test endpoint (navStart vs. pageStop) and device. Seems fine to me.

mcomella commented 3 years ago

I made a try run without conditioned profiles: it is 1120ms, which is faster than the conditioned profiles runs we've been seeing (1322ms https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-692246058) and the difference is consistent with what I see locally.

However, that means CI is still faster for an unknown reason. I suppose our best lead is rooted vs. unrooted or personal phone vs. non-personal phone.

mcomella commented 3 years ago

Next steps:

mcomella commented 3 years ago

We also decided to disable condprof given that it wasn't running what users were experiencing:

acreskey mcomella: I found an inherent problem with conditioned profiles and the multicommit test: The addon check will run on startup if the Services.appinfo.version doesn't match between profile and binary. (This changes with gecko versions). The multicommit fenix test uses one conditioned profile for all of the commits. But if the geckoversion changes midway through, the Services.appinfo.version will not match the conditioned profile, thus incurring a slower startup. There may be other problems with using the conditioned profiles, but going back to your options from yesterday:

• Run without condprof • Run with condprof knowing we're testing a code path that isn't common • Fix the bug in automated confprof conditioning So far I don't think the last is solvable without adding more complexity around this. acreskey Although I like conditioned profiles in general, because of these issues I'm personally thinking that it might be best to simply not use them in this use case. mcomella I agree that it makes sense to disable them for now – they seem useful but I'd rather have a simpler test we understand the limitations of (and better matches user experiences) than one we don't really understand Let's add more layers when we sure they're improving the outcome 🙂 (which is why FNPRMS was so minimal - we had no time to add layers 😁)

mcomella commented 3 years ago

I took new startup profiles on Nightly 200916 18:07 because I wasn't sure if the old profiles https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-691331138 used condprof or not:

Before taking the profiles, I:


We can use these profiles to:

mcomella commented 3 years ago

We got runs on a non-personal, rooted P2. Conditioned profiles:

PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1498.9, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1495, 1503, 1502, 1627, 1487, 1435, 1474, 1500, 1478, 1488], "lowerIsBetter": true, "value": 1498.9, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}
PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1489.6, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1470, 1487, 1462, 1495, 1476, 1508, 1504, 1490, 1496, 1508], "lowerIsBetter": true, "value": 1489.6, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}
PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1480.1, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1488, 1466, 1478, 1502, 1499, 1448, 1488, 1464, 1484, 1484], "lowerIsBetter": true, "value": 1480.1, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}
PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1490.9, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1515, 1513, 1505, 1483, 1502, 1458, 1403, 1523, 1522, 1485], "lowerIsBetter": true, "value": 1490.9, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}
PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1491.6, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1488, 1491, 1484, 1495, 1494, 1509, 1467, 1498, 1511, 1479], "lowerIsBetter": true, "value": 1491.6, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}

Non-cond prof:

PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1071.8, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1038, 1062, 1112, 1060, 1062, 1085, 1060, 1071, 1069, 1099], "lowerIsBetter": true, "value": 1071.8, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}
PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1094.4, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1082, 1106, 1083, 1113, 1110, 1107, 1089, 1098, 1085, 1071], "lowerIsBetter": true, "value": 1094.4, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}
PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1075, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1069, 1074, 1082, 1069, 1085, 1067, 1069, 1070, 1107, 1058], "lowerIsBetter": true, "value": 1075, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}
PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1100.4, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1123, 1086, 1106, 1103, 1183, 1074, 1082, 1093, 1080, 1074], "lowerIsBetter": true, "value": 1100.4, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}
PERFHERDER_DATA: {"suites": [{"name": "VIEW", "type": "pageload", "value": 1087.7, "unit": "ms", "extraOptions": [], "lowerIsBetter": true, "alertThreshold": 2.0, "shouldAlert": false, "subtests": [{"name": "browserScripts.pageinfo.processLaunchToNavStart", "replicates": [1080, 1101, 1108, 1125, 1072, 1074, 1084, 1100, 1059, 1074], "lowerIsBetter": true, "value": 1087.7, "unit": "ms", "shouldAlert": false}]}], "framework": {"name": "browsertime"}, "application": {"name": "fenix"}}

This basically matches CI (non-cond prof run). That means there is a discrepancy in my local setup.


Resummarize

To follow-up on the last summary https://github.com/mozilla-mobile/perf-frontend-issues/issues/141#issuecomment-692246058, the problem we're trying to solve is ensuring mach perftest will accurately represent performance changes. We've seen a few red flags:

  1. Conditioned profiles are slower than non-conditioned profiles and seem to run code outside of the user start path: we disabled conditioned profiles for now
  2. My local runs are slower than CI. However, someone else's runs are the same: what is different about my set-up

Action items

mcomella commented 3 years ago

acreskeyMoz agreed that we may want to just start using the system. We will:

mcomella commented 3 years ago

File follow-ups to document more differences and find root cause of rooted/unrooted discrepancy

mcomella commented 3 years ago

Figure out how to start using perftest for regression detection (new issue?)

https://github.com/mozilla-mobile/perf-frontend-issues/issues/162

Sounds like this investigation is done! 🎉