Agenda 2023-03-28 - Githubissues

Meeting notes

Present: Jonathan Bedard, James Graham, Weizhong Xia, James Scott, Daniel Smith, Simon Pieters, Panos Astithas

Agenda: (link)

Current setup for each vendor

jgraham: Want to have requirements document for where we want to end up… Jonathan: Our mobile testing setup is reliant on simulators. Don’t do any kind of on-device testing. Extremely challenging and expensive and in some cases not possible. Simulators on a Mac. Run in simulator. Not familiar with how testing works for chromium, gecko. But from discussion with Sam seems similar to content_shell in chromium. Difference for us is we hook up a socket connection, not a file connection. Functionally the same. Command-line app. jgraham: From the point of view of requirements, macOS + simulator Jonathan: Yes. We have macs running in a room somewhere. Challenge with simulators is macs have become faster but not as fast as iOS devices (?). We were able to boot 12 simulators, now we’re down to 6. Diminishing returns as time goes on. jgraham: Do we have to run latest macOS for this? Jonathan: depends. I’m not too familiar with the specific rules. More or less, you must be a sufficiently new macOS to run brand new iOS simulators. Changes over time what sufficiently new means. One major version behind usually works. jgraham: That's good, it means we don't need to worry about things immediately falling over with each macOS release. Jonathan: we have a good idea of what runs on what and why. We can tell you what to avoid. jgraham: Google? Weizhong: I wasn’t able to find out why the document isn't accessible yet. Showing a document. (reading from doc) Testing Chrome Android and WebView. Still working on Chrome iOS. wpt result uploading runs every 3 hours, to wpt.fyi. jgraham: Is this different from your internal CI? Weizhong: Yes. This is for wpt.fyi. Don’t check against test expectations. jgraham: Do you have an example of an upload? Don’t see on wpt.fyi runs. Weizhong: We’re not actively monitoring it, might be broken. Was working before. Will attach a link on github. Panos: How to access Chrome Android results, need to uncheck "aligned" and "only master branch" Direct Link to ChromeAndroid runs in wpt.fyi with only ChromeAndroid checked. jgraham: it’s running master, not with local chrome patches. Weizhong: right. The build should not depend on anything else. Downloading wpt from upstream. jgraham: So you have your own CI system that’s hooked up and provide results. But the CI setup is not accessible. Weizhong: Yes. jgraham: So this may be a bug in how you’re uploading results, they should be tagged as master. jgraham: OK, for Mozilla. Unlike Google we don’t have a specific run for generating wpt.fyi. We’re only running it against our own vendored copy of wpt (with local patches that ahven’t been upstreamed yet). So doesn’t directly correspond to an upstream sha. Can’t upload results fo rhtis reason. Runs on all branches. Similar to Google’s setup, for Android, tests are running on Linux on Android emulator. Something something unreliable. Unlike other tests, inside Docker. Use Virtual Machine, can’t run on a normal cheap cloud instance. Machine type is Linux KVM VCP? Allows nested virtualization, which is needed for the emulator. Relevant for why we can’t port our setup and run for upstream. Similarly to Apple’s situation, for mobile we’re not using the full Firefox. We have “gecko view test app”, a wrapper around the GeckoView component. Can navigate and stuff, has webdriver. Pretty much everything you’d expect from a browser except the browser UI. Can’t just download a random Firefox build and test that. Weizhong: For webview, we are using a content shell. For Chrome Android, we’re using the real app. jgraham: don’t know if we can in principle run the full Firefox Android in the test setup. Maybe UI elements interfere with the testing, with no way to disable. But is pretty close to be able to run in the full browser. If people are worried about test app being different, we can try get the full app to work. Don’t know the details of ??? But GeckoViewTestApp should be good enough. Weizhong: What browser are you using for mobile testing? What browser is in scope? jgraham: my view is that it’s an open question. Might be best effort. Seems harder than desktop to be teh “actual” browser that ships to end users. Sam said for Safari can’t use release Safari. If we want experimental results has to be content shell equivalent. Jonathan: Yeah use STP… The issue on iOS is STP doesn’t exist for iOS. For simulator with the current rules that are in place, it’s possible to open Safari with ToT frameworks, but isn’t reliable. Basically, no desire to make it reliable. No way to test tip-of-tree Safari with tip-of-tree WebKit in simulator. Issue is STP doesn’t exist for mobile safari. Weizhong: Asking for each vendor, do we want webview results for mobile? Panos: agree. Is it clear to say that assuming you put up the effort to make STP available, ??? Jonathan: yeah it would be a weird frankenstein build. Not clear what it would get over webkit + content shell build. Anytime wpt failures they are in webkit, not safari. Panos: same question for mozilla. Remember there were some difference. jgraham: I don’t know that there are no cases where it’d be different… Testing is with internal app, confident that it’s close enough to what is shipped to users. No plans to change this. I suggest we scope it to whatever is the closest equivalent to the mobile browser for each vendor. Other configurations like other browsers running on iOS, can be out of scope for now. Panos, Jonathan: :+1: Weizhong: Performance aspect? You’re running wpt on try, how long does that take? jgraham: No idea but can find out. We do something similar to what google and apple do in terms of CI setup. Small number of core image on GCP, get a single job that runs one set of tests, maybe 2 in parallel. Large number of VMs instead. Looking at a random Android build, 30 wpt jobs, I don’t know how long they take on average. One that take 30ish minutes. Scheduled into 30 minute chunks maybe. Jonathan: Curious how it compares to desktop testing. Same as what you get on desktop? Mobile slower or faster? jgraham: don’t have detailed numbers but desktop opt builds are a bit faster. Also running slow configs on desktop. Difference between big machines with multiple instances vs lots of machines, dunno. This is what we have. Panos: Apple running webkit on real macOS hardware. GeckoView on VMs. Are the machines Google is using nested VMs or cheap VMs? Weizhong: Google also require KVM Panos: each vendor contributing test results seems the only plausible answer vs. us running all of the tests/browsers in a shared Cloud instance

Requirements for shared infrastructure

jgraham: we have support for testing on Android. Support for installing emulator, installing browser, but isn’t used so may be broken. Can’t run in CI, don’t have appropriate instances. In terms of how we would go about this, two cases. Apple case: should be straightforward to use cloud instances. Some version behind, don’t know status for updating. It’s engineering work to make it work. Jonathan: affirmative. jgraham: For Android things are harder. Need to get access to the hardware we need to run the testing in the cloud, or have a different approach like Panos suggested, each vendor runs themselves and provide results. I think there are advantages to the former, makes it much more transparent. Everybody has the same access to help fix problems, but historically we havne’t found a way to make that happen. Some years ago there was a plan to try to mkae machine types available to the community task cluster which we’re using, but didnt’ happen then. I can ask if this can happen now. In practice we’re hoping someone (Google, MS) could provide infrastructure for that approach. Jonathan: How frequently are we running tests per day? jgraham: For desktop Chrome and Fireofx, every commit to master. For other browsers depends on capacity, every 3 hours and less frequent setup. Desktop is cheap. Jonathan: tryign to reason about what would be required from Apple’s side to do this in terms of frequency. Need to talk to Sam. jgrhaam: for interop, daily runs would be fine. Panos: what would it take to migrate from taskcluster to new Google Cloud or Azure? How much would it cost? jgraham: I can ask about the taskcluster community instance. Something has moved to GCP mostly. You may be in a better position to estimate GCP cost. Can ask though. Panos: is there a way to figure out capacity, how many VMs what types etc. James: inventory would help us to do cost estimation on our side. jgraham: yeah, it’s possible to figure out from the public data. Can see how many runs are running etc from the logs. But I can also ask around.

Action items

[ ] jgraham figure out how much capacity do we need to reproduce the existing desktop setup
[ ] [all] how much capacity do we need for mobile (once per day?). Estimate what that looks like for capacity.
[ ] (For next meeting probably.) Write down a document with options. If there’s more research needed it should be clear what’s missing.

web-platform-tests / interop-mobile-testing

Agenda 2023-03-28 #5

Current setup for each vendor

Requirements for shared infrastructure

Action items