tiny-pilot / tinypilot

Use your Raspberry Pi as a browser-based KVM.
https://tinypilotkvm.com
MIT License
2.95k stars 245 forks source link

Fix e2e interferences, and establish robust parallelism #1694

Open jotaen4tinypilot opened 7 months ago

jotaen4tinypilot commented 7 months ago

Merging over the grouped e2e tests uncovered an interference problem in our e2e test setup.

Update: we disabled concurrent tests locally to alleviate the problem for local development for the time being. The topic in itself is still valid, however.

Problem

Our Playwright setup is this:

Due to the new grouping and our usage of the beforeEach mechanism, the timing behaviour of the tests happened to change, compared to before:

When the security-dialog tests are running, they alter the server state and toggle on the auth requirement. The about-dialog tests are executed in parallel, and can now randomly fail, since Playwright happens to attempt to open the about dialog during a time when the auth requirement is active (caused by one of the concurrently ran security tests). Therefore, Playwright fails to open the about dialog in the first place, as it’s stuck on the login page.

This problem isn’t new, but so far the timing was in our favour, so we were lucky enough to not run into this. Now, with the different timing behaviour, the e2e tests can fail locally.

Again: due to us only using a single worker on CI, this problem ~only manifests~ was only manifesting itself locally, not on CI.

Solution

As we continue to add more and more e2e tests (which I think is terrific), I suggest we fix our e2e test setup, and make it robust against these kinds of issues. Specifically, I think we should establish a deliberate mechanism for parallelism:

mtlynch commented 7 months ago

Yeah, this is a tricky one. It looks like Playwright has more granular parallelism controls than we've been using, but it doesn't look like we can say "run everything in parallel, but make sure nothing else is running while this parallelism-unfriendly test is running."

The way I approached this on PicoShare is that each Playwright session has its own, independent in-memory SQLite database. Before each test, Playwright POSTs to a dev-only route to say it wants a per-session database. The server assigns a cookie in response, and then whenever the server receives subsequent requests with that cookie, it matches the request to the corresponding in-memory SQLite database.

That system works okay, but it gets a little hacky when we have to share backend state between two browser sessions.

jotaen4tinypilot commented 7 months ago

Another potential approach that I stumbled across is to divide the tests into different test projects, where each project can have its own configuration.

I’m not sure, however, what the best way is to run these projects independently. E.g., we could use the --project CLI flag, but then we’d have to issue multiple playwright invocations.

I haven’t looked much into that, so maybe this could be a viable approach. There might also be a bunch of other options to research.