microsoft / playwright

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
https://playwright.dev
Apache License 2.0
66.05k stars 3.6k forks source link

[BUG] page.on('request') is not capturing favicon.ico URI #7493

Open alishba0133 opened 3 years ago

alishba0133 commented 3 years ago

Context:

Playwright Version: playwright-1.12.1
Operating System: Ubuntu 18.04
Python Version: 3.8
Browser: chromium v888113

Code Snippet

import os
import asyncio
from playwright.async_api import async_playwright

async def request(request):
    print('request %s' % request.url)

async def coroutine():
    async with async_playwright() as playwright:
        # Launch browser
        binary = playwright.chromium
        browser = await binary.launch(headless=True)
        page = await browser.new_page()
        page.on('request', request)
        await page.goto("http://nedec.co.kr/favicon.ico")
        await browser.close()

asyncio.run(coroutine())

Describe the bug

page.on('request') is not being emitted for this url. We rendered this url in puppeteer and it captures the request successfully.

mxschmitt commented 3 years ago

We currently intentional filter out favicon requests. What is your use-case, do you want to intercept or process it somehow?

alishba0133 commented 3 years ago

We currently intentional filter out favicon requests. What is your use-case, do you want to intercept or process it somehow?

In my project I want to save the favicon file and perform brand detection using it.

dgozman commented 3 years ago

This seems like a valid usecase, but it does not fit the testing world, where favicons bring flakiness.

sohaib17 commented 3 years ago

@dgozman, is there any workaround to capture favicon.ico request? I was able to capture it through CDP but it makes the logic quite complex.

While rendering a random URL ignoring favicon.ico request might be OK, but if someone tries to render favicon.ico itself for example https://www.google.com/favicon.ico then page.goto() returns None which is unexpected because page gets rendered successfully.

Custom handling would then be required in user code to handle such cases where page is rendered but response is None.

Environment:
- playwright-1.15.0
- CentOS 8.4
- Python 3.8
dgozman commented 3 years ago

@sohaib17 There is no workaround right now. If this request turns out to be popular, we'll make it work.

mihailik commented 2 years ago

Looks very strange and arbitrary.

mihailik commented 2 years ago

I wanted to have regression tests against not including correct modernised favicon, but seems even if the app misses it and falls back to favicon.ico -- it still near impossible to detect.

Fly-Playgroud commented 2 years ago

@sohaib17 There is no workaround right now. If this request turns out to be popular, we'll make it work.

Hello, I have the same requirement and would like to be able to trigger a favicon.ico request in headless mode. I hope you will be able to implement this feature.

mrDarcyMurphy commented 2 years ago

This one wasted a lot of time today because the favicon request shows up when debugging tests, but not when running the tests.

vn7n24fzkq commented 1 year ago

I hope we do not filter the favicon, maybe make the filter optional. I was expecting the page.on('response') to capture every responses I see in the network list from the browser DevTools.

adulau commented 1 year ago

Any progress on this one? We are actually using playwright for many security projects and collecting the favicon is critical for us. For example, we use to fingerprint vulnerable systems. We were wondering why the filtering is done in playwright for favicon as a standard browser does the query?

Rafiot commented 1 year ago

Just trying to get more attention on that one before I start implementing a manual work around to get the relevant favicon.

Is there any chance to see this feature implemented in the near future? Or at least a way to force fetching from the current context?

byt3bl33d3r commented 1 year ago

Bump, just ran into this issue as well.

piercefreeman commented 1 year ago

If you're already using Chromium, this is pretty easy to do over CDP. You'll just need to use the new headless mode or a headful spawn, since the old headless wouldn't render favicons as part of their pipeline.

A monkeypatch also seems possible here by ignoring the favicon flag, but this will probably break more often.

import { chromium, Browser, Page, ChromiumBrowser } from 'playwright';

// Initialize everything
async function initialize() {
  const browser: ChromiumBrowser = await chromium.launch({
    headless: true, // Run in headless mode
    args: ['--headless=new'] // Enable the new headless mode
  });
  const page: Page = await browser.newPage();
  const client = await page.context().newCDPSession(page);

  await client.send('Network.enable');

  // Store the favicon data here
  const faviconData: { [url: string]: any } = {};

  // Listen for requests for favicon
  client.on('Network.requestWillBeSent', async (params) => {
    const { request } = params;
    if (request.url.endsWith('favicon.ico') || request.url.includes('/favicon')) {
      console.log(`Favicon request detected: ${request.url}`);
    }
  });

  // Listen for favicon responses
  client.on('Network.responseReceived', async (params) => {
    const { response, requestId } = params; // Extract requestId here
    if (response.url.endsWith('favicon.ico') || response.url.includes('/favicon')) {
      console.log(`Favicon response received: ${response.url}`);

      // Fetch response body via CDP using the correct requestId
      const { body, base64Encoded } = await client.send('Network.getResponseBody', { requestId });

      // Store or process the favicon data
      faviconData[response.url] = base64Encoded ? Buffer.from(body, 'base64') : body;
    }
  });

  // Navigate to the page
  await page.goto('https://google.com');

  // Wait 5 seconds for the page to load
  await new Promise((resolve) => setTimeout(resolve, 5000));

  // Print the favicon data
  console.log(faviconData);

  // Close the browser
  await browser.close();
}

// Run the initialization
initialize().catch((error) => {
  console.error(`An error occurred: ${error}`);
});
mihailik commented 1 year ago

@piercefreeman You rock, thanks a lot!!!