Closed mcomella closed 4 years ago
Looking at the HANOOB/AL logs for 04.21, 04.22, and 04.23 (I'm surprised there's an 04.23 yet because the nightly for those days hasn't run yet), they're empty. For example: https://github.com/mozilla-mobile/fenix-nightly-perftest-results/blob/master/2020.04.21-hanoob.log
I notice in the general logs that it doesn't announce that it's downloading builds either. It immediately goes to (example):
Using fennec-nightly as a variant! /opt/fnprms/run_logs/2020.04.17-ha.log
Using fennec-nightly as a variant! /opt/fnprms/run_logs/2020.04.08-ha.log
...
Whereas previous logs have:
Downloading apk.
Done downloading apk.
Running tests
Performing Streamed Install
Success
Starting: Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] cmp=org.mozilla.fennec_aurora/.App }
Starting by using am start-activity org.mozilla.fennec_aurora/.App
...
I'm guessing test.sh
is no longer running or something, just do_times.sh
.
We have results as of the 24th again and there's a 40ms+ regression. I've asked MarcLeclair to re-run the 21st (the first NA result) which might tell us if the builds were busted or if it was something else.
MarcLeclair re-ran the 21st and it came in correctly (with a regression on the 21st from the 20th). My theory is that there were local changes to FNPRMS that caused this (MarcLeclair was experimenting with updating the scripts to automatically git push so that could have been related).
That being said, it's working now so we determined when the HANOOB regression occurred (20th to 21st) so I don't think it's necessary to re-run the results from the 22-23.
I'll do a quick dive to see if I can find the cause of the regression. If not, I'll file another issue to do that investigation. Otherwise, there is nothing else to do for this issue.
Using the Fenix builds as posted on Taskcluster, I think we're looking at git log 94f19b7feea...ce0bad5ffb7
(20th earliest - 21st latest) for the regression.
Looking at the commits inside fenix, I didn't see anything that looks like it'd affect start-up performance.
I still need to investigate the android-components commits at that time. However, it's possible there is something else amiss in the testing environment. For example, maybe the G5 that's running the tests has more background operations on it now than it did before such that if we re-ran the test for the 20th, we'd see the same regression.
FWIW, not seeing this blip in other device data, see: https://github.com/bdekoz/FNPRMS-results/blob/master/pixel_4_xl-fennec-nightly-hanoob-results.csv
FWIW, not seeing this blip in other device data, see: https://github.com/bdekoz/FNPRMS-results/blob/master/pixel_4_xl-fennec-nightly-hanoob-results.csv
Thanks for letting us know!
We discovered a potential root cause: when SSH'ing into the device today, MarcLeclair found the G5 had no network connectivity. Some solutions are:
We'll discuss more in sync today.
Now that this bug is growing in scope, I filed a new bug for investigating the regression and left it in triage – https://github.com/mozilla-mobile/fenix/issues/10304 – and will leave this issue to addressing the issues with FNPRMS.
Stand-up: we decided to write a script to make re-running easy. We asked jeanygong to check if bdekoz has cycles to work on something like this. As such, assigning to jeanygong and unassigning from myself.
I decided it'd be clearer to file a new issue for the follow-up solution of writing a script to re-run FNPRMS instead: https://github.com/mozilla-mobile/perf-frontend-issues/issues/110 This issue is targeted specifically on identifying the missing results, for which we've already re-run. As such, closing.
See https://github.com/mozilla-mobile/fenix-nightly-perftest-results/blob/master/hanoob-results.csv