grafana / xk6-browser

The browser module adds support for browser automation and end-to-end web testing via the Chrome Devtools Protocol to k6.
https://grafana.com/docs/k6/latest/javascript-api/k6-browser/
GNU Affero General Public License v3.0
343 stars 41 forks source link

Deadlock: Navigation stops when request interception is enabled #1072

Closed inancgumus closed 12 months ago

inancgumus commented 1 year ago

NetworkManager deadlocks while racing with two different browser network events:

This issue only happens when we have outgoing requests that are in the k6-blocked hosts and IPs list. The outgoing requests won't let other communications with the browser go through. When we remove the network.GetResponseBody call (called by emitResponseMetrics when a request loading finishes or we're handling a redirect), the test run finishes.

We discovered how and why this problem occurs, but the underlying reason why this is happening has yet to be entirely apparent. Find a way to reproduce this behavior with a Go test, discover the underlying reason, report it, and fix the problem.

Update: Root cause discovered

The main page redirects to another page before finishing its loading. The redirected page gets blocked by the network interception (blocked hosts and IPs) analysis stage. The problem was that we needed to tell the redirected page to continue. But we couldn't because we were waiting for the main page to load. Hence the deadlock.

xk6-browser version

v1.1.0

Steps to reproduce the problem

Relevant links

### Tasks
- [x] Find the root cause of the issue
- [x] Reproduce the issue with a Go test
- [x] https://github.com/grafana/xk6-browser/pull/1084
- [x] https://github.com/grafana/xk6-browser/pull/1078
- [x] https://github.com/grafana/k6/pull/3425
- [x] https://github.com/grafana/xk6-browser/pull/1086
- [x] Test the fix on remote
- [x] Make a k6 release on remote
- [x] Add an improvement issue (https://github.com/grafana/xk6-browser/issues/1093)
- [ ] https://github.com/grafana/k6/pull/3447
inancgumus commented 1 year ago

We receive and process the following and then send a getResponseBody:

<- {"method":"Network.loadingFinished","params":{"requestId":"51317.37","timestamp":24102.926893,"encodedDataLength":6759,"shouldReportCorbBlocking":false},"sessionId":"2D176DA231707745A2E7710694B2A890"}
-> {"id":147,"method":"Network.getResponseBody","params":{"requestId":"51317.37"}}

getResponseBody is intercepted by the network request blocker and waits for us to tell it to continue, but we never do because we're blocked in the same NetworkManager goroutine, calling getResponseBody in loadingFinished. The following is never processed:

<- {"method":"Network.requestWillBeSent","params": ...}
<- {"method":"Fetch.requestPaused","params":{"requestId":"interception-job-20.0" ...}
<- {"method":"Network.loadingFailed","params":{"requestId":"51317.38", ...}

We need to unblock the request in NetworkManager.onRequestPaused. When we run each network event in a different goroutine, the issue resolves.


So, the issue is getting the response body while the request is intercepted:

This deadlocks because both are on the same goroutine.

We can fix the issue by running onLoadingFinished in another goroutine:

case *network.EventLoadingFinished:
    go m.onLoadingFinished(ev)

This results in the following:

Network.loadingFinished: RequestID:"82697.70"
-> {"id":177,"sessionId":"12A27C27647CF8CA2F6D3178FB1415D0","method":"Network.getResponseBody","params":{"requestId":"82697.70"}}

Fetch.requestPaused: RequestID:"interception-job-33.0", NetworkID:"82697.71"
-> {"id":178,"sessionId":"12A27C27647CF8CA2F6D3178FB1415D0","method":"Fetch.continueRequest","params":{"requestId":"interception-job-33.0"}}
<- {"id":178,"result":{},"sessionId":"12A27C27647CF8CA2F6D3178FB1415D0"}

<- {"id":177,"result":{"body":"..."} ✅
inancgumus commented 1 year ago

We've solved this issue in #1078. You can read about the root cause in Update: Root cause discovered heading in this issue's description.

However, the solution can be suboptimal since it uses Go routines for each response metric emission. Although these goroutines should be short-lived and won't cause any performance issues down the line, stack traces can be confusing. Especially when something goes wrong, we want to analyze the logs (since goroutines dump their state on panic).

We can find a better solution that carefully tracks network events. Then we can postpone fetching the body and do that when the correct time comes. I'll create another issue to find a better solution.

cc: @grafana/k6-browser