mozilla-mobile / perf-frontend-issues

A repository to hold issues related to front-end mobile application performance.
4 stars 0 forks source link

Research how other apps measure startup performance #97

Closed mcomella closed 4 years ago

mcomella commented 4 years ago

As a follow-up to our future of cold startup discussions...

Instead of reinventing the wheel, let's find out how other apps are measuring cold startup (and briefly non-startup measurement).

We should double-check on licensing but Marc mentioned Chrome uses something called catapult: https://chromium.googlesource.com/catapult/

mcomella commented 4 years ago

I completed my research. In general, it was not easy to find 1) what metrics folks took on the app to measure startup and 2) the harnesses they used to run the application – because it wasn't straightforward, I'm assuming useful information is limited. Furthermore, I realized that our requirements are more intensive than most other developers because we not only want to catch regressions but we want to measure against other benchmarks (e.g. Fennec).

Here are a few solutions I found and who uses it:

I think our best bet is moving forward with BT. However, I did also find great resources that demonstrated how to set up tests: e.g. removing noise and outliers, especially those targeted at Android. Those resources are:

In particular, we may want to look at:

FB managed to get their perf tests with minimal noise at 50 trials, which is better than we're doing (though we are measuring startup, which is a very complicated use case). They run hourly and have automatic bisection on regressions.

mcomella commented 4 years ago

I additionally took a look at the Chromium source to understand how they measure performance. I found some interesting docs:

I additionally found the source code of their mobile startup benchmark and the back-end implementation for Android. I didn't learn much – there are many layers of abstraction – but it appears that like Facebook they also wait for throttling (source). I could dig in further to see what else they do but I don't think it's worth the time: I think the high-level MobileLab post from Facebook and the Google IO Jetpack Benchmark talk from Google probably cover what we would learn (at least to the 80-20 rule).

Conclusions

We should continue with our current approach for startup measurement (browsertime test harness with some extraction from FNPRMS) but we should leverage the lessons from Facebook's MobileLab blog post and the Jetpack Benchmark IO talk to identify steps we should take to reduce noise in our pipeline.