hardkoded / puppeteer-sharp

Headless Chrome .NET API
https://www.puppeteersharp.com
MIT License
3.34k stars 439 forks source link

SetRequestInterceptionAsync(false) throwing TargetClosedException (Network.setCacheDisabled) #2693

Closed StrangeWill closed 1 week ago

StrangeWill commented 1 month ago

We have a part of our application where we run interception, then disable it because it causes issues with CSS.enable via CDP.

// Disable interception, with it enabled CSS.enable will hard freeze
page.Request -= HandleRequest;
page.Response -= HandleResponse;
await page.SetRequestInterceptionAsync(false);

On some very specific sites, we're seeing a thrown exception during await page.SetRequestInterceptionAsync(false);, the exception is as follows:

PuppeteerSharp.TargetClosedException : Protocol error (Network.setCacheDisabled): Session closed. Most likely the IFrame has been closed.Close reason: Target.detachedFromTarget (Target.detachedFromTarget)
  Stack Trace:
     at PuppeteerSharp.CDPSession.SendAsync(String method, Object args, Boolean waitForCallback) in /home/runner/work/puppeteer-sharp/puppeteer-sharp/lib/PuppeteerSharp/CDPSession.cs:line 79
   at PuppeteerSharp.NetworkManager.ApplyProtocolCacheDisabledAsync(ICDPSession client) in /home/runner/work/puppeteer-sharp/puppeteer-sharp/lib/PuppeteerSharp/NetworkManager.cs:line 629
   at PuppeteerSharp.NetworkManager.ApplyProtocolRequestInterceptionAsync(ICDPSession client) in /home/runner/work/puppeteer-sharp/puppeteer-sharp/lib/PuppeteerSharp/NetworkManager.cs:line 616
   at PuppeteerSharp.NetworkManager.SetRequestInterceptionAsync(Boolean value) in /home/runner/work/puppeteer-sharp/puppeteer-sharp/lib/PuppeteerSharp/NetworkManager.cs:line 143

I can do something like this:

try
{
    await page.SetRequestInterceptionAsync(false);
}
catch (TargetClosedException exception) when (exception.Message == "Protocol error (Network.setCacheDisabled): Session closed. Most likely the IFrame has been closed.Close reason: Target.detachedFromTarget (Target.detachedFromTarget)")
{
    // Old faithful -- try/catch/swallow
}

And as far as we can tell, this doesn't cause any issues, I don't know what is specifically causing this forced closed message, I haven't been able to reproduce a test site that reproduces the issue, but I can hand over the site that does if a dev wants to look at it.

Edit: Let me go putz with this in node and see if I just need to take this upstream.

StrangeWill commented 1 month ago
using PuppeteerSharp;

var fetcher = new BrowserFetcher();
await fetcher.DownloadAsync();
await using var browser = await Puppeteer.LaunchAsync(
           new LaunchOptions
           {
               Headless = true
           });

await using var page = await browser.NewPageAsync();
await page.SetRequestInterceptionAsync(true);

async void HandleRequest(object? _, RequestEventArgs args)
{
    try
    {
        await args.Request.ContinueAsync();
    }
    catch (Exception exception)
    {
        Console.WriteLine($"Error while intercepting request: {exception.Message}");
    }
}

page.Request += HandleRequest;

try
{
    await page.GoToAsync("[Site here]");
}
catch (NavigationException ex) when (ex.InnerException is TimeoutException)
{
    // Ignore timeouts
}

page.Request -= HandleRequest;
await page.SetRequestInterceptionAsync(false);

Console.WriteLine("DONE");

Minimal reproducible code (it does require us configure an interceptor event), from what I can tell I cannot reproduce this in node:

import puppeteer, { ExtensionTransport } from "puppeteer";

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.setRequestInterception(true);

page.on("request", (request) => {
    try {
        request.continue();
    }
    catch { }
});
try {
    await page.goto("[site here]", { waitUntil: "networkidle0" });
} catch {
    // Do nothing
}

page.removeAllListeners("request");
await page.setRequestInterception(false);
console.log("Done");
kblok commented 4 weeks ago

Hey! Sorry for the late response. Do you have any website to test?

Tiggerito commented 1 week ago

I'm seeing an error with the same setup. It started in v19.0.0 and did not happen in v18.1.0.

It's also only happening on some domains. For example wbc(dot)co(dot)uk. It does not happen if I remove the interception code.

An exception throws within GoToAsync:

PuppeteerSharp.NavigationException: 'Navigating frame was detached'

PuppeteerException: Navigating frame was detached
kblok commented 1 week ago

Thank you for the example @Tiggerito. I'll take a look at it tomorrow!

kblok commented 1 week ago

Fix is on the way

Tiggerito commented 5 days ago

I'm now seeing more detailed error messages:

Navigating frame was detached: Connection failed to process Target.attachedToTarget. Timeout of 2000 ms exceeded.

I increased the timeout to see if that would help. Not really.

The timeout is triggered when processing FrameManager.Client_MessageReceived. I don't see that code doing anything for 'Target.attachedToTarget', so I suspect it's waiting on the _frameTreeHandled.Task to complete in FrameManager.InitializeAsync, which happens to be called by FrameManager.OnAttachedToTarget when the target is an iframe.

So it seems that FrameManager.InitializeAsync is sometimes taking a long time to complete the _frameTreeHandled task.

kblok commented 5 days ago

@Tiggerito are you getting that on wbc?

Tiggerito commented 5 days ago

wbc?

kblok commented 5 days ago

wbc?

from here

For example wbc(dot)co(dot)uk. It does not happen if I remove the interception code.

Tiggerito commented 5 days ago

Doh!

It happens for multiple sites but is intermittent. i.e. I can't replicate it on my dev machine. I did see it happen a few times when I removed the interception code, so it may just increase the chances of it happening.

It happens about 20 times a day, when it's crawling over 2,000 pages, so about 1%. wbc has been fine lately.

I've increased the timeout to 5 seconds to see if it reduces the occurrences.