POCs for end-to-end testing frameworks

jonkafton commented 8 months ago

Description/Context

We're looking at introducing an E2E testing framework, tooling for local test development and CI to run the tests.

The goal is establish which framework best suits our purposes by running POCs on popular options and assessing viability.

Rationale

The React app currently contains unit tests. These are great for quickly offering some confidence during CI, though do not describe the product feature set and often suffer from being overly simple or testing too small a unit, ie. unlikey to catch unintentional change to behavior. A passing unit test suite does not guarantee that the application works to the degree a comprehensive E2E/acceptance test suite might.

As unit tests live alongside their components they limit non functional code change (refactoring) as to change the component requires work on the unit test. We are more interested that the application functions and less so how it functions (black box testing). We should not export otherwise private functions for the unit tests as it is then not clear which are public to the app - the testable unit entrypoints are the exports.

The other “end” of the E2E test depends on the ease of deploying server side components in isolation in CI. The tests may depend on running components in a pre-live environment where this is not practical (e.g. Cloud native services). Tests will also often depend on specific data being present to produce the expected UI and should clean out any data added during the test run (unless covered by some retention policy). An option is for the tests to spin up an ephemeral database bootstrapped with any necessary data to both produce the initial state and avoid the extra overhead of cleaning up test data. We should also consider which tests run and when. For example we may want to run a subset of read only tests to sanity check new deployments to production environments.

The terms “E2E” and “acceptance” testing are often loosely defined though we can use E2E in that we are testing the browser app within the context of its supporting backend and integration points and acceptance in that we are functionally testing the application’s requirements driven feature set. We are writing functional tests for acceptance that run end to end. In placing backend services under test we encourage cross-functional feature based development.

Plan/Design

Popular frameworks include Cypress, Playwright and Testing Library.

Requirements:

E2E tests must emulate a user’s behavior by interfacing the browser’s UI plane.
E2E tests should run quickly in a headless browser in a timeframe comparable to unit tests.
Tests should be written in a BDD style that describes the product feature specification. Project management stories and issues can be referenced in the test suite.
Evaluate the framework for ease of element selection and traversal. Do we need to augment our code with special selector handles for the test (avoided where possible) or can we select on what we are expecting to appear (e.g. querying by text content).
E2E testing notoriously suffers from timing issues and race conditions between load rendering and running assertions. Tests should run predictably and yield the same results each run. Evaluate the mechanisms provided by the framework for waiting and retrying DOM render (code cleanliness / language API, convenience). Brittle tests and false negatives can lead to team apathy in responding to red test runs.
Evaluate developer experience. Lightweight / minimal install, development server, hot reloading. Is there a UI "playground" for developing and running tests.
Evaluate reporting output for clarity of identifying failure causes, visual snapshots, publishable html report.

Out of scope for POC but to keep in mind:

Viability of running in CI, backgrounding the system under test, targeting live environments and data handling.
A code coverage reporting during CI runs and orchestration.

pdpinch commented 8 months ago

What will we have when this ticket is closed? A proposal? a decision? An implementation?

jonkafton commented 8 months ago

Aiming for us to have selected a test framework and in doing so have a working test suite with some initial assertions to build on. Output will include an evaluation summary to put back to the team. A subsequent ticket with cover the full implementation (baseline specs and CI).

ChristopherChudzicki commented 8 months ago

My 2c is: Either Cypress or Playwright would be fine. I like Playwright's APIs a lot better, but it gives you less control over the browser.

It seems to me that the much harder question is how to handle data so that e2e tests can be run locally and also against rc/prod. IMO, that's worth thinking about in this POC issue. One idea:

We could write tests that largely ignore specific resource names, and check things like "cards on carousel, buttons work" without caring too much about specific words on the cards

Playwright vs Cypress

Regarding Testing Library: This isn't really an e2e testing option. Testing library is a collection of testing tools packages (assertions, DOM queries) that provide similar, high-level interfaces to test frontend code. We currently use this in our frontend + mocked-backend unit/integration tests. See below.

Retries and Selectors: Both Cypress and Playwright can auto retry assertions and wait for selections to appear. Both support querying DOM via user-facing attributes (role, state, text). Playwright has these Testing Library-esque queries builtin, and for Cypress, similar functionality comes from @testing-library/cypress

Debugging: Both have --debug modes that show the browser. Playwright can, I believe, generate tests from the GUI, though I don't know how good they are. Cypress might have something similar. Being a MS project, Playwright also has very good integration with VS Code.

Cypress

I personally find the developer experience poor (at least... annoying) because Cypress tests are very asynchronous but you can't use async / await. Most cypress objects have .then methods, but they aren't real promises, so you can't await them.

Parallelization: It used to be true that in order to run tests in Parallel, you had to use Cypress's cloud CI service. I'm having trouble telling if this is still true.

Playwright

The developer experience seems very good to me.

Browser Support: I believe you get much less control over which browser version you use with Playwright than with Cypress. I don't think you can specify specific versions of browsers, at least not easily[^1]. Playwright ships bundled with its own browsers. In general, when you update Playwright, you change which browsers you test with (to newer versions). References:

This seems like an odd design to me, and I don't really understand the choice.

[^1]: I believe Playwright uses standard Chromium / Chrome, but patches Firefox and Webkit. So I think you can make it work with any specific version of Chromium above some number, but only with bundled Firefox/Webkit.

Existing Tests, Goals

We already use Testing Library[^2] in our JSDom-powered "unit" tests. Because Testing library's APIs delegate to React for render and are designed to interact with components similar to how a user would, I have pretty high confidence in our existing tests to check behavior and regressions.

[^2]: Specifically, we use: @testing-library/react, a thin wrapper around React's own test renderer. (Which I believe is roughly the same as React's real renderer, but throws errors more aggresively.) And two framework agnostic libraries, @testing-library/dom for querying the DOM and @testing-library/user-events for emulating user integrations in JSDom.

This:

The React app currently contains unit tests. These are great for quickly offering some confidence during CI, though do not describe the product feature set...

is reasonable characterization of Enzyme tests, but is IMO is much less true for unit/integration tests written with Testing Library. For example, this test, if written in Playwright, would look almost identical. (Even down to method names, since Playwright adopted some Testing Library methodology.)

Realistic Data: Additionally, although we mock our backend APIs in the frontend tests, we should be confident that the data is realistic since it is constructed to match our OpenAPI schema.

That said, there are certainly limitations to our current method:

Because our tests are run with JSDOM (a node-based browser emulator) rather than a real browser, we can't fully test:
- anything visual
- anything layout-based, like infinite scroll or dragging
- components like CKEditor, that use the contenteditable APIs which aren't supported in JSDom.
We can't test on RC/Prod with real data, nor can we fully test things like the auth flow (create account, verify, login, etc).

The other virtue of e2e tests I see is that, being independent of the application, they should be very resistant to "artificial failures" from refactoring.

jonkafton commented 8 months ago

Existing Tests, Goals

We already use Testing Library2 in our JSDom-powered "unit" tests. Because Testing library's APIs delegate to React for render and are designed to interact with components similar to how a user would, I have pretty high confidence in our existing tests to check behavior and regressions.

Taking a step back then to think about the cost-benefit value of proposed pure E2E acceptance testing against our existing unit tests, given these go quite some way towards emulating a user's viewpoint and masking implementation.

The ideal for testing a user facing application is that user centric feature requirements are asserted by the tests and that the test report describes the product spec, tying the project requirements gathering through to delivery.

The cost to automate a human tester is typically higher than testing against source code so it's a reasonable concession to run unit tests where they give a good degree of confidence for a much smaller development cost and where they run quickly and give early and accurate feedback. Historically (Selenium, WebDriverIO) E2E tests have been prone to false negatives (brittle element selection, race conditions around load/render/assert), time consuming to write and debug and slow to run due to limitations in communicating with the browser (browser drivers, JSON Wire Protocol). These issues have largely been addressed with the newer generation of test runners. Playwright uses Chrome DevTools protocol for Chromium browsers (Chrome, MS Edge). Cypress instruments browsers at the application layer to run tests in the same process as the code being tested.

Our unit tests are hybrid in that they are framed in terms of the user for a large part they emulate the browser environment and user interactions and so tick boxes towards acceptance testing. @testing-library/react intentionally does not expose React component instances, props or state, so meets much of the criteria for not being bound to implementation detail. They do however lack the key benefits of E2E tests that @ChristopherChudzicki mentions above:

They can be pointed to any hosted environment for post deployment verification. While white box testing can give confidence in the code, it cannot verify that things have deployed correctly. With sufficient test confidence a goal can be to release to production without human intervention.
They run against real data and verify that backend services are running correctly. Although in our context here the tests focus is UI (APIs should be covered by their own integration tests), we are also able to assert that the full system is functioning.
They are able to test flows outside of the front end application itself - authentication, admin flows, etc.
Refactors and non functional code changes are guaranteed not to need test reworks.

The question then, if we are to realize these benefits, do E2E tests supercede the current approach of unit testing or when should we write one or the other? I would say yes (assuming the unit test suite is repurposed towards testing any pure functions and heavier logic), with the conditions that:

The development cost is on par:
- E2E tests will look very similar to our existing, though we would need to migrate them or at least re-implement a test baseline.
- Onward development time should be on par, potentially improved with E2E test library tooling, in particular the development servers and workbench interfaces (Playwright UI Mode, Cypress App).
The run cost is comparable:
- The gap between unit and E2E testing for run duration has reduced considerably due to reasons above. The E2E suite should run in seconds or some minutes (5 max?). For reference our existing unit tests run in 40s on CI (47 suites, 283 tests).
The E2E tests are stable:
- The tests must run reliably and predictably producing the same results each run.
We don't need significant code change for the tests:
- Adding hooks for the tests (e.g. data-* attributes) should be used for special cases only. If the tests are dependent on code we start to lose change resilience.
We are able to output code coverage reports and aim towards full coverage.

mbertrand commented 8 months ago

One thing I'm curious about is how well the various E2E frameworks integrate with Github CI/CD. Seems like both Cypress and Playwright have plugins for doing so.

jonkafton commented 8 months ago

One thing I'm curious about is how well the various E2E frameworks integrate with Github CI/CD. Seems like both Cypress and Playwright have plugins for doing so.

Yes, both provide Docker images with browsers and system dependencies pre-installed (Cypress, Playwright), plus Cypress has a custom action referenced in your link above - I don't foresee any issues pointing the tests to hosted environments to validate deployments. I'll write up an issue to cover pre-deployment testing where we'll want to run the application locally to CI - additional challenges there such as bootstrapping test data and orchestrating the containers (e.g. to close with code coverage output).

jonkafton commented 7 months ago

We can wrap up our framework selection, firstly as we have a team preference for Playwright and it is in use in other OL projects. Additionally, it quickly emerges as a newer generation of E2E testing solutions, primarily having overcome a key pain point of Cypress - that all normal JavaScript commands, assignment and control flow logic must be mediated by Cypress to be visible to it. By mapping commands to an internal queue, Cypress cleverly bridges JavaScript control flow to element selection with seamless wait and retry. There’s convenience in an asynchronous command sequence being internally produced from simple object chaining syntax - the developer is relieved of any promise or callback handling, though the penalty is that the execution sequence is not idiomatic to the language and as a result can be unintuitive and unpredictable. To provide these capabilities, Cypress ships an architecture that runs tests directly in the browser. This approach was certainly a welcome improvement on the earlier Selenium WebDriver based solutions, though is somewhat a deal breaker relative to Playwright’s approach of natural JavaScript (it also supports Python, C#, Java) and native automation through DevTools protocols.

Testing Library is not a candidate as it’s not a full fledged testing framework and does not include a test runner. Instead it provides integrations for various test runners and client frameworks, providing methods for querying elements and making assertions. We are using it for unit testing React components against an emulated DOM. It provides a Cypress plugin that extends Cypress commands with selection methods that follow its guiding principles of isolating tests from implementation detail. Its author writes a good article aligned with our key aim of E2E testing that the system functions as opposed to how it functions. This involves testing according to how users and assistive technologies perceive the page rather than relying on selector paths or code hooks for tests to find elements.

Some observations:

Run time benchmark. This is to locally run a single test that loads the homepage and checks that the page headings are in place. Without much sample size, let's not read too much into these.
- Cypress: 13s
- Playwright: 9.5s
UI workbench. Both provide visual testing interfaces to run and debug tests and interact with browser engines, Playwright UI Mode and Cypress App. Cypress App is built on Electron and I found it to be a little clunky and slower to load than Playwright UI Mode. Both have a very convenient locator tool to select a UI element to get the selector code, though Cypress only returns the classname, while Playwright gives the code to locate an element by role. The runs and debug screens in the workbench are only available with paid Cypress Cloud.
Element selection. Out of the box, Cypress does not include an API to select elements as the user would see them, https://docs.cypress.io/api/table-of-contents#Queries, so we need to use dom paths/ids/classnames. Testing library is needed to extend commands to e.g. select by role, https://testing-library.com/docs/cypress-testing-library/intro. Cypress recommends using silent data-* attributes to reliably select elements, though this requires augmenting our code https://docs.cypress.io/guides/references/best-practices#Selecting-Elements. Playwright prioritizes role-base locators that emulate how users and assistive technologies perceive the page. It is able however to select elements by their text content, though will sometimes lack specificity without additional selectors.
Assertions. Cypress conflates element selectors and assertions such that an element selector includes an implicit assertion that fails a test if the element cannot be found after wait and retry. This may be a good thing depending on perspective, though I find implicit behavior problematic where it deviates from normal expectation from idiomatic use of the language. We are also unable to provide assertion messages for easy reading in the test reports, e.g. in Playwright:
```
await expect(
page.getByRole("link", { name: "MIT Open" }),   
"Header link is visible",
).toBeVisible()
```
The equivalent in Cypress (with Testing Library) is succint, but it is not clear from the code that this is also an assertion:

cy.findByRole("link", { name: "MIT Open" })

Target browsers and versions. Cypress supports Chrome, Edge and Firefox (versions from 80, 80, 86 respectively). The browser must be installed on the system - generally an image would need to be built for browser version being targeted. Playwright can run on Chromium (Chrome, Edge), WebKit (Safari, Opera) and Firefox. The browser version is baked to the Playwright version, so multiple versions cannot easily be tested on the same environment / container.

Smaller:

Testing native mouse hover is not possible with Cypress, https://docs.cypress.io/api/commands/hover
The Node.js environment is not available in Cypress for accessing environment variable (no process). Aside from passing through CYPRESS_* on the environment, variables need to be set on the Cypress config.

jonkafton commented 7 months ago

Branches with setup and basic initial homepage test: Cypress: https://github.com/mitodl/mit-open/compare/jk/401-evaluate-e2e-cypress Playwright: https://github.com/mitodl/mit-open/compare/jk/401-evaluate-e2e-playwright

jonkafton commented 7 months ago

It seems to me that the much harder question is how to handle data so that e2e tests can be run locally and also against rc/prod. IMO, that's worth thinking about in this POC issue.

I've written up an issue here @ChristopherChudzicki that covers this, https://github.com/mitodl/mit-open/issues/418.

mitodl / mit-open