Analyze, document and establish what standards we (don’t) want

rpappalax commented 4 years ago

for each platform: iOS / Android

isabelrios commented 4 years ago

I have started the document so far WIP to document each project. @rpappalax feel free to jump in and start leaving comments or let me know if I should change the direction of the doc...

rpappalax commented 4 years ago

I have started the document so far WIP to document each project. @rpappalax feel free to jump in and start leaving comments or let me know if I should change the direction of the doc...

Hi Isabel, thanks for putting some much good info together. I think this is more thorough than I imagined in that it documents our usage of the various repos. That is helpful as onboarding docs as well. What I envisioned around 'standardizing' is that we identify any best-practices we're doing in one place and see how we might apply them elsewhere. Perhaps saying "apply best practices across all projects" might have been a better way to word the objective. That said, I think what you've created may actually be the best starting point. We can now look closely at what's been done in each practice and more clearly identify which things we'd like to standardize on as well as create action items on how to get there. Let me know if you'd like to set aside 45 min and brainstorm together or take another pass at it on your own. I'm happy to help

isabelrios commented 4 years ago

I see you point, yeah. I wanted to document all that as an intro. Once we know what/how we do in each project, think about what we can standardize, change in one project and the options to get there. Let me continue with that tomorrow/Friday and set some time next week for a review/brainstorm if that sounds good. Thanks!

rpappalax commented 4 years ago

A couple of things to probably add once you start to make your action items list (which came out of meeting this morning):

standardize taskgraph jobs and kind files
roll out UI test taskgraph test report to A-C and R-B also (example)
standardize flank files
verify all ui-test Dockerfiles use latest flank API
schedule routine maintenance w/ a team calendar, etc.

isabelrios commented 4 years ago

Thanks Richard, I will add that to the document. Sorry I thought it was more Android vs iOS, so I was not considering discrepancies between Android projects yet. But I will for sure

rpappalax commented 4 years ago

@isabelrios I think perhaps one initial step in standards might be creating a general doc around UI tests in general.

What kind of tests exist overal (UI tests, unit tests, raptor tests, etc.)
A separate doc about UI tests themselves (where we devote most of our time)
Who 'owns' the tests?
When a test must be fixed? when should a test be disabled?
Test infrastructure, etc.

isabelrios commented 4 years ago

@rpappalax Yeah, I will do that, the first part of the document I started already has some info about that, would be easy to collect the data into a new document

rpappalax commented 4 years ago

Some additional notes around Firebase / desired test types from Slack chat w/ Firebase community

A bunch of thoughts/comments: Physical devices as a whole have historically exhibited a higher level of test infrastructure failures compared to virtual devices. (That said, FTL has had specific short-term issues with AVD updates that have momentarily flipped that equation.) Android OEMs customize Android in all kinds of weird ways that can cause test variability/instability, whereas AVDs should provide a more consistent environment. While still fairly rare, physical devices are also prone to overheating, battery failures, and USB/WiFi quirks that do not affect AVDs. Generally newer physical device models have greater capacity than older models, due to device attrition. Just curious, did you start using flaky test reruns with AVDs? Flank had a bug where it would only display status information for the initial test executions in a matrix. If any executions failed, they would be rerun N times, but Flank would not display any status for the reruns, making it appear that the test matrix was "hung" for many minutes. Experiencing 45m test latency for a 3m AVD test seems very extreme to me, unless it was during an outage. What AVD model+API was that on? Please do report matrix IDs for such cases so that week's on-call engineer can investigate. Physical devices are generally faster than AVDs, which don't have GPUs. So once you get a device allocated, usually the tests will execute faster (but cost more overall due to the higher per-minute cost). "Startup time" should not be an issue, both physical devices and AVDs are fully booted before they are allocated. What is "too much load" can vary dramatically from day-to-day or even minute-to-minute for physical devices. I could tell you which physical device has the most spare capacity right now, but that might be completely wrong 15 minutes later. (edited)

my reply: It sounds like for running test batches especially if in a burst, we might be better off with AVDs. AVDs may be a bit slower (in theory), but it sounds like the reliability factors of physical devices (i.e. finite resource, aging devices, queueing) can all contribute to a slowness at an unexpected times. For us consistency can’t be sacrificed so it sounds like for that we’d prob be better off using AVDs. We do however have some non-blocking (cron-triggered) tests that could run on physical devices and it sounds like this might be the best use case there.

isabelrios commented 4 years ago

@rpappalax I would like take a look at the doc I started with you, part of that info is covered, still WIP but adding info about all those topics. Too much info so I want to try to organize as you would like (for example, overall tests or split by platform (iOS, Android), also by project? should I add all projects or only Fenix and Firefox-iOS?

rpappalax commented 4 years ago

@isabelrios I believe you accomplished this. Do you want to close it out? I do think we should do a follow-up on the action items you've created and create separate issues for those in Q3. What do you think?

isabelrios commented 4 years ago

Yes @rpappalax this can be closed. My idea would be that the doc could be a live doc where we add/modify things as they happen.

For the moment, I would open a few issues as tasks/issues, the Action Points that came up from the standarization in each repo (like adding the test report to R-B) Once I add those I will close this.

mozilla-mobile / mobile-test-eng

Analyze, document and establish what standards we (don’t) want #5