grafana / xk6-browser

k6 extension that adds support for browser automation and end-to-end web testing via the Chrome Devtools Protocol
https://grafana.com/docs/k6/latest/javascript-api/k6-experimental/browser/
GNU Affero General Public License v3.0
337 stars 42 forks source link

Non-iterational goroutines should not abort an iteration #688

Open inancgumus opened 1 year ago

inancgumus commented 1 year ago

Brief summary

Goroutines other than the one that drives an iteration should not panic. Otherwise, k6 won't catch the panic and fail.

We should take control of our non-iteration goroutines: Not panic and cause an iteration to be prematurely aborted.

More details about how to craft a solution:

xk6-browser version

v0.7.0

OS

Staging Cloud

Docker version and image (if applicable)

grafana/k6-cloud-scripts-to-archive:2.40.6-xk6browser-v0.7.0-1

Steps to reproduce the problem

Pawel ran the following script, and we got sentry alerts because of the panics.

This script panics. ```javascript import { check, sleep } from 'k6'; import { Counter } from 'k6/metrics'; import { chromium } from 'k6/x/browser'; const checkpoint1 = new Counter('checkpoint1'); const checkpoint2 = new Counter('checkpoint2'); const checkpoint3 = new Counter('checkpoint3'); const checkpoint4 = new Counter('checkpoint4'); const checkpoint5 = new Counter('checkpoint5'); const checkpoint6 = new Counter('checkpoint6'); const checkpoint7 = new Counter('checkpoint7'); export let options = { vus: 100, duration: '10h', } export function setup(){ console.log("Running setup") } export default function () { const browser = chromium.launch({ headless: true }); const context = browser.newContext(); const page = context.newPage(); checkpoint1.add(1); // Goto front page, find login link and click it page.goto('https://test.k6.io/', { waitUntil: 'networkidle' }); checkpoint2.add(1); Promise.all([ page.waitForNavigation(), page.locator('a[href="/my_messages.php"]').click(), ]).then(() => { checkpoint3.add(1) // Enter login credentials and login page.locator('input[name="login"]').type('admin'); page.locator('input[name="password"]').type('123'); // We expect the form submission to trigger a navigation, so to prevent a // race condition, setup a waiter concurrently while waiting for the click // to resolve. checkpoint4.add(1) return Promise.all([ page.waitForNavigation(), page.locator('input[type="submit"]').click(), ]); }).then(() => { checkpoint5.add(1) check(page, { 'header': page.locator('h2').textContent() == 'Welcome, admin!', }); }).finally(() => { checkpoint6.add(1) page.close(); browser.close(); checkpoint7.add(1) }); sleep(1); } export function teardown() { console.log("Running teardown") } ```

The script has an incorrect usage of promises, and the fix is easy:

page.goto('https://test.k6.io/', { waitUntil: 'networkidle' });

// this should be page.goto(...).then(...)

However, the problem is deeper than it seems.

See more details in the test run: 1620122 (proxy).

Actual behaviour

Panic causes an iteration to be prematurely aborted.

### Tasks
- [ ] Don't panic in `Response`
- [ ] Don't panic in `JSHandle`
- [ ] Don't panic in `ElementHandle`
- [ ] Don't panic in `Frame`
- [ ] Don't panic in `Page`
- [ ] Don't panic in `Worker`
- [ ] Don't panic in `BrowserContext`
- [ ] Don't panic in `Browser`
- [ ] Don't panic in `BrowserType`
- [ ] TBA: Look at the remaining panics and split them up again into smaller tasks.
- [ ] Move `k6ext.Panic` into the mapping layer.
- [ ] Move `k6ext.Promise` into the mapping layer and incorporate `panicIfInternalError` in it.
ankur22 commented 1 year ago

Types of panic

  1. ✅ go panic. -- all documented
  2. k6ext.Panic.
  3. k6common.Throw. -- all documented

Panic vs Error

Error

Should denote to the user that an error occurred but there's a way around it, i.e. they can fix it themselves. This shouldn't stop the whole test but just the current iteration.

Panic

Should denote an abort of the whole test run (all iterations) as an unexpected internal error was encountered which we need to fix e.g. the goja runtime is missing from the content. The assumption i'm making is that if it happens in the current iteration it's probably happening in all iterations.

Areas where I think are ok to use panic (abort the whole test run)

NOTE:

Where runs on main thread Reason
https://github.com/grafana/xk6-browser/blob/63d5db735ef8770a61aa811e7a698dbadea47793/tests/test_browser.go#L349 Y used in unit/integration tests
https://github.com/grafana/xk6-browser/blob/63d5db735ef8770a61aa811e7a698dbadea47793/tests/test_browser.go#L92 Y used in unit/integration tests
https://github.com/grafana/xk6-browser/blob/16c0b8cef2b85c096a13ead97d5c57fe3533baf4/k6ext/panic.go#L34 Y k6ext.panic eventually uses a k6common.Throw, this seems fine for now as long as k6ext.Panic is used sensibly
https://github.com/grafana/xk6-browser/blob/9047580e50c5bba56ef3c6d0ce268ea7dfd8f6ea/log/logger.go#L181 Y panics when goroutine id can't be found during logging
https://github.com/grafana/xk6-browser/blob/4485c6d0caf9276b3a772e708a23f526a23f665c/keyboardlayout/layout.go#L72 Y When keyboard registration fails
https://github.com/grafana/xk6-browser/blob/16c0b8cef2b85c096a13ead97d5c57fe3533baf4/k6ext/panic.go#L39 Y No browser PID
https://github.com/grafana/xk6-browser/blob/16c0b8cef2b85c096a13ead97d5c57fe3533baf4/k6ext/panic.go#L22 Y No k6 goja runtime
https://github.com/grafana/xk6-browser/blob/f9153fd9e236536e3d5ab92adf32b49afac34d35/browser/module.go#L56 Y To prevent cloud runs
https://github.com/grafana/xk6-browser/blob/575b9751b2b08194e618a8332724b67908f4378f/browser/mapping.go#L52 Y When the goja mapping fails
https://github.com/grafana/xk6-browser/blob/9047580e50c5bba56ef3c6d0ce268ea7dfd8f6ea/common/options.go#L50 Y programmer error if we're not passing in the correct type (change to k6ext.Panic)
https://github.com/grafana/xk6-browser/blob/9047580e50c5bba56ef3c6d0ce268ea7dfd8f6ea/common/options.go#L41 Y programmer error if we're not passing in the correct type (change to k6ext.Panic)
https://github.com/grafana/xk6-browser/blob/575b9751b2b08194e618a8332724b67908f4378f/common/frame.go#L401 N programmer error if we're not passing in the correct type (change to k6ext.Panic)
https://github.com/grafana/xk6-browser/blob/9884e917ff626303efb2e7dc99a1c086c11a1f65/common/frame_session.go#L991 Y Programmer error -- trying to update subframe viewport which can only be done on the main frame (change to k6ext.Panic)
https://github.com/grafana/xk6-browser/blob/575b9751b2b08194e618a8332724b67908f4378f/common/element_handle.go#L1508 ? unexpected error that we now need to code in
https://github.com/grafana/xk6-browser/blob/575b9751b2b08194e618a8332724b67908f4378f/common/element_handle.go#L1504 ? nil error from dom, programmer needs to investigate symptom

List of panics that could be errors (only stop the current test iteration)

Where reason
https://github.com/grafana/xk6-browser/blob/e62e7c12372f210914ea8d9ba70ddb392dbdc93e/chromium/browser_type.go#L68 unimplemented method, user can remove the call
https://github.com/grafana/xk6-browser/blob/e62e7c12372f210914ea8d9ba70ddb392dbdc93e/chromium/browser_type.go#L210 unimplemented method, user can remove the call

Question

  1. Can we return an error to goja instead of panicking?
    1. Yes we can. I tested this with the Launch method, and the behaviour seems to be the same if we panic from Launch or return an error. The difference is that the API changes from Launch(opts goja.Value) api.Browser to Launch(opts goja.Value) (api.Browser, error). POC: https://github.com/grafana/xk6-browser/pull/718
    2. IMO a panic should be used to denote an abort of the whole test run (all iterations) as an unexpected internal error was encountered which we need to fix e.g. the goja runtime is missing from the content.
    3. IMO an error should denote a user error that can be fixed by the user themselves e.g. the element with name "foo" couldn't be found, and a time out occured. This shouldn't stop the whole test but just the current iteration.
  2. Panic in code is ok if running on the "main" thread, which should mean it's ok to panic from APIs, e.g. goto since it will only stop the current iteration.
    1. This is true. A panic on the "main" thread will stop the current iteration but not the whole test run.
ankur22 commented 1 year ago

We briefly discussed the strategy going forward, and we've agreed that we should pick a file and convert the errors into panic panics into errors where appropriate and ensure that a panic is only used if we need to abort the whole test process due to an unexpected internal error.

Based on the feedback from the first PR, we should be able to convert the rest of the codebase.

inancgumus commented 1 year ago

@ankur22, what I have in mind is to let the mapping layer handle the panics and the internal (core/business logic) parts of the code return errors. This is more idiomatic and maintainable in the long run (from the perspective of tests + code). But it's not something you necessarily need to tackle in this issue. I agree with the rest of your findings 👍

ankur22 commented 1 year ago

That's a good idea. We could assert on the behaviour of the error to determine whether to panic or return the error: https://dave.cheney.net/2014/12/24/inspecting-errors