mozilla-mobile / perf-frontend-issues

A repository to hold issues related to front-end mobile application performance.
4 stars 0 forks source link

FNPRMS is missing results from 4/21-4/23, 4/27 #102

Closed mcomella closed 4 years ago

mcomella commented 4 years ago

See https://github.com/mozilla-mobile/fenix-nightly-perftest-results/blob/master/hanoob-results.csv

mcomella commented 4 years ago

Looking at the HANOOB/AL logs for 04.21, 04.22, and 04.23 (I'm surprised there's an 04.23 yet because the nightly for those days hasn't run yet), they're empty. For example: https://github.com/mozilla-mobile/fenix-nightly-perftest-results/blob/master/2020.04.21-hanoob.log

I notice in the general logs that it doesn't announce that it's downloading builds either. It immediately goes to (example):

Using fennec-nightly as a variant! /opt/fnprms/run_logs/2020.04.17-ha.log
Using fennec-nightly as a variant! /opt/fnprms/run_logs/2020.04.08-ha.log
...

Whereas previous logs have:

Downloading apk.
Done downloading apk.
Running tests
Performing Streamed Install
Success
Starting: Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] cmp=org.mozilla.fennec_aurora/.App }
Starting by using am start-activity org.mozilla.fennec_aurora/.App
...

I'm guessing test.sh is no longer running or something, just do_times.sh.

mcomella commented 4 years ago

We have results as of the 24th again and there's a 40ms+ regression. I've asked MarcLeclair to re-run the 21st (the first NA result) which might tell us if the builds were busted or if it was something else.

mcomella commented 4 years ago

MarcLeclair re-ran the 21st and it came in correctly (with a regression on the 21st from the 20th). My theory is that there were local changes to FNPRMS that caused this (MarcLeclair was experimenting with updating the scripts to automatically git push so that could have been related).

That being said, it's working now so we determined when the HANOOB regression occurred (20th to 21st) so I don't think it's necessary to re-run the results from the 22-23.

I'll do a quick dive to see if I can find the cause of the regression. If not, I'll file another issue to do that investigation. Otherwise, there is nothing else to do for this issue.

mcomella commented 4 years ago

Using the Fenix builds as posted on Taskcluster, I think we're looking at git log 94f19b7feea...ce0bad5ffb7 (20th earliest - 21st latest) for the regression.

mcomella commented 4 years ago

Looking at the commits inside fenix, I didn't see anything that looks like it'd affect start-up performance.

mcomella commented 4 years ago

I still need to investigate the android-components commits at that time. However, it's possible there is something else amiss in the testing environment. For example, maybe the G5 that's running the tests has more background operations on it now than it did before such that if we re-ran the test for the 20th, we'd see the same regression.

bdekoz commented 4 years ago

FWIW, not seeing this blip in other device data, see: https://github.com/bdekoz/FNPRMS-results/blob/master/pixel_4_xl-fennec-nightly-hanoob-results.csv

mcomella commented 4 years ago

FWIW, not seeing this blip in other device data, see: https://github.com/bdekoz/FNPRMS-results/blob/master/pixel_4_xl-fennec-nightly-hanoob-results.csv

Thanks for letting us know!


We discovered a potential root cause: when SSH'ing into the device today, MarcLeclair found the G5 had no network connectivity. Some solutions are:

We'll discuss more in sync today.


Now that this bug is growing in scope, I filed a new bug for investigating the regression and left it in triage – https://github.com/mozilla-mobile/fenix/issues/10304 – and will leave this issue to addressing the issues with FNPRMS.

mcomella commented 4 years ago

Stand-up: we decided to write a script to make re-running easy. We asked jeanygong to check if bdekoz has cycles to work on something like this. As such, assigning to jeanygong and unassigning from myself.

mcomella commented 4 years ago

I decided it'd be clearer to file a new issue for the follow-up solution of writing a script to re-run FNPRMS instead: https://github.com/mozilla-mobile/perf-frontend-issues/issues/110 This issue is targeted specifically on identifying the missing results, for which we've already re-run. As such, closing.