grafana / xk6-browser

k6 extension that adds support for browser automation and end-to-end web testing via the Chrome Devtools Protocol
https://grafana.com/docs/k6/latest/javascript-api/k6-experimental/browser/
GNU Affero General Public License v3.0
337 stars 42 forks source link

`TestConnectionClosureAbnormal` is flaky #39

Open imiric opened 2 years ago

imiric commented 2 years ago

We've seen intermittent failures of TestConnectionClosureAbnormal/closure_abnormal mostly in CI and ocasionally in local runs as well.

The current test expects the error code 1006 (abnormal closure), but sometimes--typically under high load or in VMs running on overprovisioned hardware, as is usually the case for GH Actions--it will fail with code 1001 (going away):

=== RUN   TestConnectionClosureAbnormal/closure_abnormal
    connection_test.go:67:
                Error Trace:    connection_test.go:67 Error:          Error message not equal:
                                expected: "websocket: close 1006 (abnormal closure): unexpected EOF"
                                actual  : "websocket: close 1001 (going away)"
                Test:           TestConnectionClosureAbnormal/closure_abnormal

We tried simply handling both cases in #33, but they're separate scenarios and the test should only fail with 1006 as that's what the /closure-abnormal endpoint should cause.

inancgumus commented 2 years ago

I tried to add a test for the 1001 (going away) status. Let's keep it here only for reference.

func TestConnectionGoingAway(t *testing.T) {
    const wantErr = "websocket: close 1001" // going away

    done := make(chan struct{})
    server := testutils.NewWSTestServer(t,
        "/going-away",
        http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
            conn, err := (&websocket.Upgrader{}).Upgrade(w, req, w.Header())
            require.NoError(t, err)

            conn.WriteControl(
                websocket.CloseMessage,
                websocket.FormatCloseMessage(websocket.CloseGoingAway, ""),
                time.Now().Add(time.Second),
            )

            <-done
            defer func() {
                done <- struct{}{}
            }()

            var any interface{}
            err = conn.ReadJSON(any)
            require.ErrorContains(t, err, wantErr)
        }))
    defer server.ServerHTTP.Close()

    url, _ := url.Parse(server.ServerHTTP.URL)
    wsURL := fmt.Sprintf("ws://%s/going-away", url.Host)
    ctx := context.Background()
    conn, err := NewConnection(
        ctx,
        wsURL,
        NewLogger(ctx, NullLogger(), false, nil),
    )
    require.NoError(t, err)

    // close the connection before the client has a chance to read
    // from it. this will result in a "going away" error.
    conn.Close()
    done <- struct{}{}

    action := target.SetDiscoverTargets(true)
    err = action.Do(cdp.WithExecutor(ctx, conn))
    require.ErrorContains(t, err, wantErr)

    <-done
}
inancgumus commented 2 years ago

Related stuff:

inancgumus commented 2 years ago

This is what I saw after running the test for half an hour on my Mac:

--- FAIL: TestConnectionClosureAbnormal (0.10s)
    --- FAIL: TestConnectionClosureAbnormal/closure_abnormal (0.07s)
        connection_test.go:67:
                Error Trace:    connection_test.go:67
                Error:          Error message not equal:
                                expected: "websocket: close 1006 (abnormal closure): unexpected EOF"
                                actual  : "websocket: close 1001 (going away)"
                Test:           TestConnectionClosureAbnormal/closure_abnormal
--- FAIL: TestConnectionClosureAbnormal (1.04s)
    --- FAIL: TestConnectionClosureAbnormal/closure_abnormal (1.01s)
        connection_test.go:67:
                Error Trace:    connection_test.go:67
                Error:          Error message not equal:
                                expected: "websocket: close 1006 (abnormal closure): unexpected EOF"
                                actual  : "websocket: close 1001 (going away)"
                Test:           TestConnectionClosureAbnormal/closure_abnormal
--- FAIL: TestConnectionClosureAbnormal (2.00s)
    --- FAIL: TestConnectionClosureAbnormal/closure_abnormal (2.00s)
        connection_test.go:67:
                Error Trace:    connection_test.go:67
                Error:          Error message not equal:
                                expected: "websocket: close 1006 (abnormal closure): unexpected EOF"
                                actual  : "websocket: close 1001 (going away)"
                Test:           TestConnectionClosureAbnormal/closure_abnormal
panic: test timed out after 10m0s

It happens while sending to or receiving from CDP. It's probably an I/O timeout issue. So we might want to give it a timeout somewhere in the code (probably in the connection.go):

goroutine 451 [select, 9 minutes]:
github.com/grafana/xk6-browser/common.(*Connection).sendLoop(0xc000ce41c0)
    /Users/inanc/xk6-browser/common/connection.go:351 +0x1d9
created by github.com/grafana/xk6-browser/common.NewConnection
    /Users/inanc/xk6-browser/common/connection.go:148 +0x4d5

goroutine 74 [IO wait, 10 minutes]:
internal/poll.runtime_pollWait(0x6e00bf0, 0x72)
    /usr/local/Cellar/go/1.17.1/libexec/src/runtime/netpoll.go:229 +0x89
internal/poll.(*pollDesc).wait(0xc000138498, 0xc0001fe000, 0x0)
    /usr/local/Cellar/go/1.17.1/libexec/src/internal/poll/fd_poll_runtime.go:84 +0xbd
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/Cellar/go/1.17.1/libexec/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000138480, {0xc0001fe000, 0x1000, 0x1000})
    /usr/local/Cellar/go/1.17.1/libexec/src/internal/poll/fd_unix.go:167 +0x419
net.(*netFD).Read(0xc000138480, {0xc0001fe000, 0x1000, 0x1000})
    /usr/local/Cellar/go/1.17.1/libexec/src/net/fd_posix.go:56 +0x51
net.(*conn).Read(0xc00011e000, {0xc0001fe000, 0x1000, 0x1000})
    /usr/local/Cellar/go/1.17.1/libexec/src/net/net.go:183 +0xb1
bufio.(*Reader).fill(0xc0001981e0)
    /usr/local/Cellar/go/1.17.1/libexec/src/bufio/bufio.go:101 +0x294
bufio.(*Reader).Peek(0xc0001981e0, 0x2)
    /usr/local/Cellar/go/1.17.1/libexec/src/bufio/bufio.go:139 +0xcc
github.com/gorilla/websocket.(*Conn).read(0xc000268420, 0x63a2c20)
    /Users/inanc/go/pkg/mod/github.com/gorilla/websocket@v1.4.2/conn.go:370 +0x5b
github.com/gorilla/websocket.(*Conn).advanceFrame(0xc000268420)
    /Users/inanc/go/pkg/mod/github.com/gorilla/websocket@v1.4.2/conn.go:798 +0x125
github.com/gorilla/websocket.(*Conn).NextReader(0xc000268420)
    /Users/inanc/go/pkg/mod/github.com/gorilla/websocket@v1.4.2/conn.go:980 +0x165
github.com/gorilla/websocket.(*Conn).ReadMessage(0xc000491ec8)
    /Users/inanc/go/pkg/mod/github.com/gorilla/websocket@v1.4.2/conn.go:1064 +0x31
github.com/grafana/xk6-browser/testutils.CDPReadMsg(0xc00019c001)
    /Users/inanc/xk6-browser/testutils/cdp.go:151 +0x45
github.com/grafana/xk6-browser/testutils.getWebsocketHandlerCDP.func1.1()
    /Users/inanc/xk6-browser/testutils/cdp.go:79 +0xa5
created by github.com/grafana/xk6-browser/testutils.getWebsocketHandlerCDP.func1
    /Users/inanc/xk6-browser/testutils/cdp.go:71 +0x2a5
inancgumus commented 2 years ago

I've removed the low prio tag from this because it's such an issue that happens almost every time I commit, and I need to rerun the CI jobs over and over 🤷

elimatripathy commented 1 year ago

I am also facing this issue in my local and it is happening almost 99% of the time even with one VU .

ERRO[0043] communicating with browser: websocket: close 1006 (abnormal closure): unexpected EOF category=cdp elapsed="0 ms" goroutine=5988 ERRO[0043] process with PID 28665 unexpectedly ended: signal: killed category=browser elapsed="1 ms" goroutine=5985

Below is my test script `/* init code - Load local files, import modules, declare lifecycle functions

/* init context: define custom function

inancgumus commented 1 year ago

Hey, @elimatripathy, could you try this again with the k6 version 0.43.1 and let us know if this error still happens?

Could you also let me know about your environment, like CPUs, memory size, OS, etc.?

elimatripathy commented 1 year ago

thank you @inancgumus . Yes this issue is reproducible with the latest k6 version 0.43.1 as well and I tried many things to make sure that this is not happening due to my system limitations like running tests in both local and docker containers plus also running performance tests on the facebook.com site. Below are my findings

Script for facebook login:

import { check, sleep } from 'k6';
import { chromium } from 'k6/x/browser';

export const options = {
  thresholds: {
    checks: ["rate==1.0"]
  },
  ext: {
        loadimpact: {
          distribution: {
            private: {
              loadZone: 'amazon:ca:montreal',
              percent: 50,
            },
          },
          projectID:$enterTheProjectID,
        },
      },
  scenarios: {
    login: {
   executor: 'per-vu-iterations',
   vus: 35, 
   iterations: 1,
   maxDuration: '1m'
    }
  },
}

export default async function() {
  const browser = chromium.launch({
    headless: __ENV.XK6_HEADLESS ? false : true,
  });
  const context = browser.newContext();
  const page = context.newPage();

  try {
    // Goto front page, find login link and click it
    await page.goto('https://www.facebook.com/', { waitUntil: 'networkidle' });
   // page.waitForSelector('img[class="background15"]');
    // Enter login credentials and login
     page.locator('input[name="email"]').type('$');
     page.locator('input[name="pass"]').type('$');
    // We expect the form submission to trigger a navigation, so to prevent a
    // race condition, setup a waiter concurrently while waiting for the click
    // to resolve.
    await Promise.all([
       page.waitForNavigation(),
       page.locator('button[type="submit"]').click(),
    ]);
    sleep(1);
    check(page, {
      'header': find(page, '#facebook').isVisible()
    });
  } finally {
    page.screenshot({ path: 'screenshot.png' });
    page.close();
    browser.close();
  }
}

// Custom method to find web element
export function find(page,locator,waitTime) {
  waitTime = waitTime || 10000;
  let flag=false;
  let iter=0;
  let element;
  while(flag==false){
    try {
      element = page.$(locator);
      if(element.isVisible()){
        flag=true;
            }
    } catch (error) {} 
    sleep(0.01);
    iter++;
    if(iter>waitTime){
      break;
    }
  }
  return element;
}

Running this script either in my local or docker containers with more than 35 virtual users caused the exception as below

image

This issue with xk6-browser is definitely stopping to test performance for any application fully . This should be prioritized or please let me know if anything I can do to fix it.

Below are my system configuration

image
inancgumus commented 1 year ago

Hi @elimatripathy, thanks for the detailed information. We'll consider this in our next planning.

inancgumus commented 3 months ago

This has happened again while running tests locally (v1.5.1):

--- FAIL: TestConnectionClosureAbnormal (0.00s)
    --- FAIL: TestConnectionClosureAbnormal/closure_abnormal (0.01s)
        connection_test.go:55: 
                Error Trace:    /Users/inanc/grafana/k6browser/main/common/connection_test.go:55
                Error:          Error "read tcp 127.0.0.1:53920->127.0.0.1:53910: read: connection reset by peer" does not contain "websocket: close 1006 (abnormal closure): unexpected EOF"
                Test:           TestConnectionClosureAbnormal/closure_abnormal