hardkoded / puppeteer-sharp

Headless Chrome .NET API
https://www.puppeteersharp.com
MIT License
3.44k stars 451 forks source link

SetRequestInterceptionAsync(true) causing Cannot read incomplete UTF-16 JSON text as string with missing low surrogate #2829

Open Tiggerito opened 4 days ago

Tiggerito commented 4 days ago

Description

Since 20.0.4 my code that uses SetRequestInterceptionAsync(true) can no longer load pages.

Complete minimal example reproducing the issue

Replace (dot) in the URL...

var options = new LaunchOptions { /*  */ };
var chromiumRevision = BrowserFetcher.DefaultRevision;
var browser = await Puppeteer.LaunchAsync(options, chromiumRevision);
var page = browser.NewPageAsync();

await page.SetRequestInterceptionAsync(true);

page.Request += async (sender, e) => await e.Request.ContinueAsync();

await page.GoToAsync("https://trimmingshop(dot)co(dot)uk/", new NavigationOptions
{
    WaitUntil = new[] { WaitUntilNavigation.Networkidle0 }
});

Expected behavior:

Page loads

Actual behavior:

GoToAsync causes an exception.

PuppeteerSharp.NavigationException: 'Navigating frame was detached: NetworkManager failed to process Fetch.requestPaused. The JSON value could not be converted to System.String. Path: $.request.postData | LineNumber: 0 | BytePositionInLine: 85183..    at System.Text.Json.ThrowHelper.ReThrowWithPath(ReadStack& state, Utf8JsonReader& reader, Exception ex)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, T& value, JsonSerializerOptions options, ReadStack& state)
   at PuppeteerSharp.Cdp.NetworkManager.Client_MessageReceived(Object sender, MessageEventArgs e)'

Versions

PuppeteerSharp 20.0.4 .NET Core 3.1, .Net 9.0

Additional Information

20.0.3 works fine, but the e.MessageData in page.Client.MessageReceived sometimes throws the same exception when I try to convert it into a JSON string. I'm guessing it is related, but it's not a critical error as I can capture it. It seems that now the code in Client_MessageReceived is also trying to get at that data, and failing.

There was a change to PostData in 20.0.4, which is mentioned in the exception.

https://github.com/hardkoded/puppeteer-sharp/pull/2810

My other issue also seems related:

https://github.com/hardkoded/puppeteer-sharp/issues/2775

kblok commented 2 days ago

It seems trimmingshop blocked my IP. I can't test this on that website :(

Tiggerito commented 2 days ago

Try https://www(dot)skinelite(dot)com/

I have a lot of examples.