hardkoded / puppeteer-sharp

Headless Chrome .NET API
https://www.puppeteersharp.com
MIT License
3.24k stars 431 forks source link

The time allocated for FrameManager to initialize is not enough (1 second) #2622

Closed AleksandrsJakovlevsVisma closed 1 month ago

AleksandrsJakovlevsVisma commented 2 months ago

Description

We are trying to use Puppeteer-Shart to convert HTML to PDF. Sometimes we could get 200 conversion requests simultaneously. At first, we tried to create a browser and a page for each request but quickly realized that this was a bad idea performance-wise. Our new strategy is to use a pool of pages which means we always use one browser and allow a maximum of 10 pages to be opened.

Still, it does not work. At some point, when a new page is created, we get this exception:

PuppeteerSharp.TargetClosedException: Protocol error (Performance.enable): Session closed. Most likely the Page has been closed.Close reason: Connection failed to process Page.frameStoppedLoading. Timeout of 1000 ms exceeded.

Coupled with stuff like this:

PuppeteerSharp.TargetClosedException: Protocol error(Target.createTarget): Target closed. (Connection failed to process {\""id\"":14,\""result\"":{},\""sessionId\"":\""BED668EF013F206DE76E55157C4B06BC\""}. Object reference not set to an instance of an object.

Reading the code I guess the following is happening:

  1. With such a high number of requests being done at the same time, the server performance is slowed down
  2. At some point, when creating a new page, FrameManager.InitializeAsync is not able to be completed within one second
  3. The connection is then closed with the "Timeout of 1000 ms exceeded" error message
  4. The rest of the requests fail since the browser they were using has a closed connection

Complete minimal example reproducing the issue

This is an approximation of what the pool looks like. Calling GetPageAsync in multiple threads will eventually lead to the above-mentioned exceptions.

using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using PuppeteerSharp;

namespace Test;

public class ChromiumPagePool() : IDisposable
{
    private readonly SemaphoreSlim _destroyPageSemaphore = new(1, 1);
    private readonly ConcurrentQueue<IPage> _pages = [];
    private IBrowser? _browser;
    private SemaphoreSlim? _currentPageCount;
    private int _destroyPageTasks;
    private bool _disposing;

    public async Task InitAsync()
    {
        _browser = await Puppeteer.LaunchAsync(new LaunchOptions
        {
            ExecutablePath = @"C:\Users\aleksandrs.jakovlevs\Downloads\chrome-win\chrome-win\chrome.exe",
            Browser = SupportedBrowser.Chromium,
            Args =
            [
                "--no-sandbox"
            ]
        });

        for (var i = 0; i < 3; i++)
        {
            var page = await _browser.NewPageAsync();
            _pages.Enqueue(page);
        }

        _currentPageCount = new SemaphoreSlim(7, 10);
    }

    public async Task<IPage?> GetPageAsync(CancellationToken token)
    {
        var res = _pages.TryDequeue(out var page);

        if (res && page != null)
        {
            return page;
        }

        await _currentPageCount.WaitAsync(token);

        page = await _browser.NewPageAsync();

        return page;
    }

    public void QueuePageDestruction(IPage page, CancellationToken token)
    {
        var pageToDestroy = page;
        var cancelToken = token;
        Task.Run(() => DestroyPage(pageToDestroy, cancelToken), token);
    }

    private async Task DestroyPage(IPage page, CancellationToken cancelToken)
    {
        Interlocked.Increment(ref _destroyPageTasks);

        if (_disposing)
        {
            page.Dispose();
            return;
        }

        try
        {
            page.Dispose();

            await _destroyPageSemaphore.WaitAsync(cancelToken);

            try
            {
                if (10 - _currentPageCount.CurrentCount <= 3)
                {
                    page = await _browser.NewPageAsync();
                    _pages.Enqueue(page);
                }
                else
                {
                    _currentPageCount.Release();
                }
            }
            finally
            {
                _destroyPageSemaphore.Release();
            }
        }
        finally
        {
            Interlocked.Decrement(ref _destroyPageTasks);
        }
    }

    public void Dispose()
    {
        _disposing = true;

        while (_destroyPageTasks > 0)
        {
            Thread.Sleep(50);
        }

        foreach (var page in _pages.Where(p => !p.IsClosed))
        {
            page.Dispose();
        }

        if (_browser?.IsConnected == true)
        {
            _browser.Dispose();
        }

        _destroyPageSemaphore.Dispose();
        _currentPageCount?.Dispose();

        GC.SuppressFinalize(this);
    }

    ~ChromiumPagePool()
    {
        Dispose();
    }
}

Expected behavior:

All 200 requests successfully convert HTML to PDF

Actual behavior:

At some point conversion fails

Versions

kblok commented 1 month ago

@AleksandrsJakovlevsVisma Are you willing to test some custom build to confirm the fix?

AleksandrsJakovlevsVisma commented 1 month ago

@kblok Sorry for the delay, I was on vacation.

Yes, I don't mind trying a custom build

kblok commented 1 month ago

@AleksandrsJakovlevsVisma I'm exposing the TaskHelper timeout here. Can you see if increasing the timeout there helps?

That's an static class that will affect the entire library.

AleksandrsJakovlevsVisma commented 1 month ago

@kblok I did the following:

  1. Downloaded that branch as zip and referenced the csproj in my proj
  2. I also did one additional change seen below image
  3. In my code I am setting DefaultTimeout to 30 secs
  4. During testing I am no longer seeing TargetClosedException, but now I am getting 504 gate-way timeouts from the API that is using Puppeteer
kblok commented 1 month ago

@kblok I did the following:

  1. I also did one additional change seen below

Good catch

  1. During testing I am no longer seeing TargetClosedException, but now I am getting 504 gate-way timeouts from the API that is using Puppeteer

That would be something on your side right?

AleksandrsJakovlevsVisma commented 1 month ago

@kblok

Thank you for helping out with this one.

It is the problem on our side, yes. The PDF conversion on average takes 6 seconds, but the request times out after a minute. 200 requests with a max count of simultaneously opened pages = 10 is not enough to make sure each of 200 finishes in a minute. Do you have any advice on how to speed up a simple HTML to PDF conversion? Perhaps there are some magic LaunchOptions args?

In any case, can we expect the configurable default timeout released in the near future?

kblok commented 1 month ago

@kblok

Thank you for helping out with this one.

It is the problem on our side, yes. The PDF conversion on average takes 6 seconds, but the request times out after a minute. 200 requests with a max count of simultaneously opened pages = 10 is not enough to make sure each of 200 finishes in a minute. Do you have any advice on how to speed up a simple HTML to PDF conversion? Perhaps there are some magic LaunchOptions args?

PDF generate is quite resource intensive. If you look at the code, most of the time is on the browser side.

In any case, can we expect the configurable default timeout released in the near future?

I can ship it today :)

AleksandrsJakovlevsVisma commented 1 month ago

@kblok

I can ship it today :)

Nice, we will look forward to it, whenever you'll have the time :)