microsoft / playwright

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
https://playwright.dev
Apache License 2.0
67.26k stars 3.7k forks source link

Playwright tests work locally but most of the tests fail when running "npx playwright test" in Azure DevOps Pipelines with errors "page.goto: net::ERR_CONNECTION_RESET" or "page.goto: NS_ERROR_NET_RESET" #16749

Closed Wilhop closed 2 years ago

Wilhop commented 2 years ago

Discussed in https://github.com/microsoft/playwright/discussions/16744

Originally posted by **Wilhop** August 23, 2022 Context: Playwright Version: Version 1.25.0 Operating System: on CI: windows-latest Node.js version: v16.13.1 Browser: Firefox and Chrome Extra: Run via Azure DevOps Pipelines in the cloud. The application login starts with the app domain, then goes to Azure login domain, then returns to original domain in the login, if it matters. **What is the goal?** The goal is to include our Playwright test suite running in Azure DevOps Pipeline with a trigger. **The problem:** The example test suite works locally with "npx playwright test" running all tests in separate files. But in Azure Pipelines, the example test suite has maybe < 50% tests passing so some of them actually work, but most fail. The command line usually shows the login failing "_page.goto: net::ERR_CONNECTION_RESET_" (see errors below in Azure Pipelines log) or _"page.goto: NS_ERROR_NET_RESET"_ with Firefox. I also sometimes "_get page.goto: NS_ERROR_UNKNOWN_HOST_". I can see from the logs that there are many "page.goto failed" lines. **What I have tried already:** I've tried switching from ubuntu to windows in the .yml. I've tried running with only 1 worker and with fullyParallel: false. I also limited the browsers to Chrome only or Firefox only. I have tried to find a solution online, but I haven't encoutered this specific problem. **azure-pipelines.yml** ``` trigger: - tests/playwright pool: vmImage: 'windows-latest' steps: - script: echo Running azure-pipelines.yml... displayName: 'Run a one-line script' - task: NodeTool@0 inputs: versionSpec: '16.x' displayName: 'nodetool 16.x' - task: Npm@1 inputs: command: 'ci' - task: CmdLine@2 inputs: script: 'npx playwright install --with-deps' - task: CmdLine@2 inputs: script: 'set CI=true && echo %CI% && npx playwright test' ``` **Azure Pipelines Job log with the errors** ``` Retry #1 --------------------------------------------------------------------------------------- page.goto: net::ERR_CONNECTION_RESET at https://mywebsite.com =========================== logs =========================== navigating to "https://mywebsite.com", waiting until "load" ============================================================ 6 | console.log(`beforeEach initiated, running ${testInfo.title}`); 7 | const lg = new login(page); > 8 | await lg.loginToAppWithAllLicenses(); | ^ 9 | }); 10 | at loginToAppWithAllLicenses (D:\a\1\s\pages\login.ts:13:25) 7) [chromium] › example.spec.ts:17:5 › example test suite › Check that News and Messages is present Test timeout of 60000ms exceeded while running "beforeEach" hook. 3 | import { login } from '../pages/login'; 4 | > 5 | test.beforeEach(async ({ page }, testInfo) => { | ^ 6 | console.log(`beforeEach initiated, running ${testInfo.title}`); 7 | const lg = new login(page); 8 | await lg.loginToAppWithAllLicenses(); page.click: Target closed ``` **An example test that fails** ``` import { test, expect, Page } from '@playwright/test'; import { Utils } from '../pages/utils'; import { login } from '../pages/login'; test.beforeEach(async ({ page }, testInfo) => { console.log(`beforeEach initiated, running ${testInfo.title}`); const lg = new login(page); await lg.loginToAppWithAllLicenses(); }); test.describe(example test suite', () => { test("Check that News and Messages is present", async ({ page }) => { await page.goto('https://mywebsite.com'); // Check that News and Messages are visible to assert that page has loaded await expect(page.locator('ls-home-news >> text=News')) .toHaveText('News'); await expect(page.locator('ls-home-messages >> text=Messages')) .toHaveText('Messages'); }); }); ``` **The login that is performed in beforeEach** ``` import { chromium, Page } from '@playwright/test'; export class login { private page: Page; constructor(page: Page) { this.page = page; } async loginToAppWithAllLicenses() { await this.page.goto('https://mywebsite.com'); // Click div[role="button"]:has-text("Email") await Promise.all([ this.page.waitForNavigation(), this.page.locator('div[role="button"]:has-text("Email")').click(), ]); // Click [placeholder="Email Address"] await this.page.click('[placeholder="Email Address"]'); await this.page.locator('[placeholder="Email Address"]').fill('email here..'); // Click [placeholder="Password"] await this.page.click('[placeholder="Password"]'); await this.page.locator('[placeholder="Password"]').fill('password here..'); // Click button:has-text("Sign in") await this.page.click('button:has-text("Sign in")'); // Select company await this.page.click('.b-number-cell'); await this.page.waitForLoadState('networkidle'); } } ``` **playwright.config.ts** ``` import type { PlaywrightTestConfig } from '@playwright/test'; import { devices } from '@playwright/test'; const config: PlaywrightTestConfig = { testDir: './tests', timeout: 60 * 1000, expect: { timeout: 5000 }, fullyParallel: true, forbidOnly: !!process.env.CI, retries: 1, workers: process.env.CI ? 1 : undefined, reporter: 'html', use: { actionTimeout: 0, screenshot: 'only-on-failure', trace: 'off', }, projects: [ { name: 'chromium', use: { ...devices['Desktop Chrome'], }, }, ], outputDir: 'test-results/', }; export default config; ``` **Running a failed test with DEBUG pw:api shows that page.goto fails** ``` Running 1 test using 1 worker pw:api => browserType.launch started +0ms pw:api <= browserType.launch succeeded +1s pw:api => browser.newContext started +3ms pw:api <= browser.newContext succeeded +76ms pw:api => browserContext.newPage started +6ms pw:api navigated to "about:blank" +924ms pw:api navigated to "about:blank" +253ms pw:api <= browserContext.newPage succeeded +6ms beforeEach initiated, running Open Customer register, save a customer and delete from database pw:api => page.goto started +8ms pw:api navigating to "https://myapp.com", waiting until "load" +2ms pw:api <= page.goto failed +744ms pw:api => page.screenshot started +5ms pw:api taking page screenshot +3ms pw:api <= page.screenshot succeeded +457ms pw:api => browserContext.close started +7ms pw:api <= browserContext.close succeeded +14ms ```

Here is an example screenshot that shows a totally blank page on a failing test image

yury-s commented 2 years ago

page.goto: net::ERR_CONNECTION_RESET indicates network issues, is your web app available on internet so that you can actually connect to it from Azure or do you have to be in the same internal network? What URLs are failing?

Wilhop commented 2 years ago

The address is a baseUrl staging environment in the format of: https://myapplicaton.online.fi/environment/ - obviously had to mask the real address. -The address is publicly available -I would say Azure virtual agent can connect to it, because < 50% of the tests do pass and perform the login action correctly -I have never seen this behavior when running the tests locally -So to recap: the failing URL is the main login page of the application https://myapplicaton.online.fi/environment/

Also: I'll try to create a reproducible test repo, but haven't managed yet.

From pw:api log => page.goto started +1ms pw:api navigating to "https://myapplicaton.online.fi/environment/", waiting until "load" +0ms pw:api <= page.goto failed +420ms

Standard log showing failure:

page.goto: net::ERR_CONNECTION_RESET at https://myapplicaton.online.fi/environment/
=========================== logs ===========================
navigating to "https://myapplicaton.online.fi/environment/", waiting until "load"
============================================================
   6 |     console.log(`beforeEach initiated, running ${testInfo.title}`);
   7 |     const lg = new login(page);
>  8 |     await lg.loginToAppWithAllLicenses();
     |              ^

From the html report, you failure shows a blank page image

Wilhop commented 2 years ago

Any tips on what to try or workarounds are welcome :) This is a showstopper problem for us as the tests need to run in Azure Pipelines in order to bring us some value. I tried to simplify the problem so that there is no beforeEach and that the test uses storage so that it doesn't have to try and login each time. The same problem persists with page.goto: net::ERR_CONNECTION_RESET

Here is an example pw:api log with a simple test with 1 worker:

test.describe('Azure Pipelines test suite', () => {
    test.use({ storageState: 'state.json' });
    test("that all cards are present", async ({ page, context }) => {
        await page.goto('https://myapp.application.fi/company/12345/100/as12d3/app/home');
Running 3 tests using 1 worker
  pw:api => browserType.launch started +0ms
  pw:api <= browserType.launch succeeded +418ms
  pw:api => browser.newContext started +4ms
  pw:api navigating to "https://myapp.application.fi/", waiting until "load" +165ms
  pw:api   "commit" event fired +21ms
  pw:api   navigated to "https://myapp.application.fi/" +0ms
  pw:api   "load" event fired +2ms
  pw:api   "domcontentloaded" event fired +1ms
  pw:api <= browser.newContext succeeded +25ms
  pw:api => browserContext.newPage started +6ms
  pw:api <= browserContext.newPage succeeded +108ms
  pw:api => page.goto started +3ms
  pw:api navigating to "https://myapp.application.fi/company/12345/100/as12d3/app/home", waiting until "load" +1ms
  pw:api <= page.goto failed +3s
  pw:api => page.screenshot started +3ms
  pw:api taking page screenshot +1ms
  pw:api   "commit" event fired +74ms
  pw:api   navigated to "chrome-error://chromewebdata/" +0ms
  pw:api   "load" event fired +2ms
  pw:api   "domcontentloaded" event fired +0ms
  pw:api <= page.screenshot succeeded +98ms
Wilhop commented 2 years ago

One additional finding: The same test "plainLogin.spec.ts" works when it is run the first time -> 1st time test succeeds When the same test is run again the second time, it will fail with -> 2nd time test fails with ERR_CONNECTION_RESET

Here is the part of azure-pipelines.yml where the first test run succeeds and the second one will fail

      # Succeeds
    - task: CmdLine@2
      displayName: 'Run the tests 1 and set environment variable CI=true'
      inputs:
        script: 'set CI=true && set DEBUG=pw:api && npx playwright test ./login/plainLogin.spec.ts'

    - publish: $(System.DefaultWorkingDirectory)/state.json
      displayName: 'Publish cached login data as state.json artifact'
      artifact: state-json
      condition: succeededOrFailed()

      # Fails
    - task: CmdLine@2
      displayName: 'Run the tests again and set environment variable CI=true'
      inputs:
        script: 'set CI=true && set DEBUG=pw:api && npx playwright test ./login/plainLogin.spec.ts'
Wilhop commented 2 years ago

I tested the same workflows in the same URL with Cypress, and I didn't face any network issues there. Is there something in the way Playwright communicates that we could investigate?

Wilhop commented 2 years ago

We managed to get rid of the error messages by moving the test environment to Azure. I'm unsure of the real reason for the errors, because the same address worked with other test automation frameworks. Eg. same tests with Cypress worked just fine.

I'm guessing the errors come from a combination of Playwright + Server side firewall configuration. We're still checking if we can find out from the server's side what is the root cause of this.

marclilja commented 1 year ago

I have had the exact same issues in Azure Pipelines with both Playwright and Cypress for a long time when running tests against our test environment (production working fine). Both hosted on Azure Container Apps. We eventually saw the same issues locally when accessing the site through VPN. In front of the servers we use Azure Front Door with caching enabled and a WAF. We concluded that the WAF was not the problem and eventually got our tests working by bypassing the cache. We do not yet understand why the cache is a problem though, it does work with the cache in the production environment.