hardkoded / puppeteer-sharp

Headless Chrome .NET API
https://www.puppeteersharp.com
MIT License
3.45k stars 453 forks source link

Use Page.DisposeAsync and Browser.DisposeAsync hang forever #1489

Closed clement128 closed 3 years ago

clement128 commented 4 years ago

Before you file a bug, have you:

Description

I am running puppeteer-sharp with docker, I found there are quite a lot of zombie chrome process never be kill. I was try to use tini as entry point (tips from here tips) also no luck. I was check the logs, I found that some DisposeAsync sometime never complete. So I change to use Dispose instead, looks good so far.

Complete minimal example reproducing the issue

There is not complete code right now, I might update later. I just simply to call DisposeAsync. E.g.

var options = new LaunchOptions
{
    Headless = Configurations.Puppeteer.Headless,
    Args = new string[]
    {
        "--no-sandbox",
        "--disable-dev-shm-usage",
        "--incognito"
    }
};
Browser = await Puppeteer.LaunchAsync(options, loggerFactory);
var browserPages = await Browser.PagesAsync();
if (browserPages.Length > 0) {
    Page = browserPages[0];
    await Task.WhenAll(browserPages.Skip(1).Select(x => x.CloseAsync()));
} else {
    Page = await Browser.NewPageAsync();
}

// some other steps 

logger.LogInformation("browser is disposing");
await Page.DisposeAsync();
await Browser.DisposeAsync();
// this log sometime never print out
logger.LogInformation("browser is disposed");

Expected behavior:

Browser dispose correctly, and no zombie process

Actual behavior:

I found some zombie process never be kill

Versions

Additional Information

Running puppeteer-sharp in docker, dockerfile similar with the one you provided, I just added tini as entry-point. (BTW, I think the docker example should also add this and run with args --disable-dev-shm-usage).

kblok commented 4 years ago

@clement128 could you try using ConfigureAwait(false) across the board?

clement128 commented 4 years ago

Hi @kblok , I am running with ASP.NET Core, base on my understanding I don't need to use ConfigureAwait(false) since there is not any SynchronizationContext by default? Correct me if I am wrong.

kblok commented 4 years ago

That's true @clement128.

pmdevers commented 4 years ago

@kblok any updates on this issue.

kblok commented 4 years ago

Sorry @clement128 @pmdevers I will take a look tomorrow.

kblok commented 4 years ago

@clement128 @pmdevers could you help me break this project https://github.com/kblok/PuppeteerSharpOnAspNetCoreDemo?

That project is working for me.

pmdevers commented 4 years ago

Issue of zombie processes is resolved when switching to netcore 3.1 Issue was on net core 2.2

see project is used https://github.com/pmdevers/PuppteerSharp-Docker

clement128 commented 4 years ago

Hi @kblok, are you run it on docker?

@pmdevers hmm interesting, I was running on netcore 3.1 also. with Orleans.

pmdevers commented 4 years ago

Upgraded the project with the issue to 3.1 and still having zombie processes. It looks like the project is locking the chrome processess.

pmdevers commented 4 years ago

https://github.com/puppeteer/puppeteer/issues/1825

Have found that the following arguments resolved the zombie processes

Args = new[] { "--headless", "--no-sandbox", "--disable-gpu", "--single-process", "--no-zygote" }

also added this https://github.com/Yelp/dumb-init

WhatFreshHellIsThis commented 4 years ago

Just a note to say before I attempt the workarounds people have listed that this is definitely still an issue "out of the box" with PuppeteerSharp 2.0.4 running in a .net core webapi server on Docker / Alpine linux.

I have a report printing setup and each time I run a report 3 new zombie chrome processes are added and never go away but rather build up over and over even with proper disposal inside a using() block. Doesn't appear to be an issue on Windows at all.

UtopleMan commented 3 years ago

Any news on this? I'm also seeing 3-4 processes hanging on Linux

mrtristan commented 3 years ago

i seemingly got this working by ditching alpine and using the args mentioned above. i'm not sure how to integrate the dumb-init bit so not doing that currently.

i'm also doing a flavor of this: https://github.com/hardkoded/puppeteer-sharp/issues/996#issuecomment-471521462 except instead i'm grabbing the process early and holding on to it, then doing a refresh before checking if i need to kill it. doing it that way because the pid lookup throws an exception if it's already been terminated on linux.

still testing things, but fingers crossed that things keep behaving from here...

mrtristan commented 3 years ago

@pmdevers I thought I was in the clear but still having the issue. Did you solve for this? I looked at your pdf repo but don't see you using dumb init in there. Was trying to figure out how to integrate that and not coming up with something that felt like it made sense

pmdevers commented 3 years ago

@mrtristan no after some investigation i did not need the dumb-init this is the snipped code.

`

         browserFetcher = new BrowserFetcher();
        _options = new LaunchOptions
        {
            Headless = true,
            ExecutablePath = "/usr/bin/google-chrome",

            Args = new[] {
                "--headless",
                "--no-sandbox",
                "--disable-gpu",
                "--single-process",
                "--no-zygote"
            }
        };
        await _browserFetcher.DownloadAsync(BrowserFetcher.DefaultRevision);
        using (var browser = await Puppeteer.LaunchAsync(_options))
        using (var page = await browser.NewPageAsync())
        {
            await page.SetRequestInterceptionAsync(true);

            page.Request += async (sender, e) =>
            {
                var header = e.Request.Headers;
                header.Add("Authorization", _userService.GetToken());
                var payload = new Payload()
                {
                    Url = e.Request.Url,
                    Method = e.Request.Method,
                    Headers = header,
                };
                await e.Request.ContinueAsync(payload);
            };

            await page.GoToAsync($"{url}/{query.DossierId}/{query.Version}",
                new NavigationOptions {WaitUntil = new[] {WaitUntilNavigation.Networkidle0}});

            var data = await page.PdfDataAsync(new PdfOptions {Format = PaperFormat.A4});

`

We use the mcr.microsoft.com/dotnet/core/aspnet:3.1 as base image

This what i used in the Docker file to install google chrome

RUN apt-get update && apt-get install -y \ apt-transport-https \ curl \ gnupg \ --no-install-recommends \ && curl -sSL https://dl.google.com/linux/linux_signing_key.pub | apt-key add - \ && echo "deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google.list \ && apt-get update && apt-get install -y \ google-chrome-stable \ --no-install-recommends \ && apt-get purge --auto-remove -y curl \ && rm -rf /var/lib/apt/lists/*

And because we use kubernetes we use de following resource limits

resources: limits: memory: "1.5Gi" cpu: "500m"

This is running in production for over 6 months without any issues

mrtristan commented 3 years ago

@pmdevers interesting. thanks for the quick response. i seem very closely aligned to your approach but still choking. will dig in further. i'm running things on AWS ECS fargate with mostly same resource limits, just an even 1G on memory. i wonder if the environmental differences are having some bearing on things.

one thing worth calling out is that it seems like you're preinstalling chrome but you're still calling await _browserFetcher.DownloadAsync(BrowserFetcher.DefaultRevision); which is likely duplicative. i wonder if that's relevant to things working for you in some way.

pmdevers commented 3 years ago

@mrtristan the browserfetcher is for local development as we set the ExecutablePath to _browserFetcher.GetExecutablePath(BrowserFetcher.DefaultRevision) and add remove "--single-process" argument if in debug mode i believe the browserfetcher only downloads if the ececutionpath does not exists

mrtristan commented 3 years ago

@pmdevers gotcha.

I think i'm down to my issue being environmental. I'm able to debug and run it locally on windows, it also functions fine in my local (linux) docker container. once i push to amazon and try and run it up there in Fargate is when i seemingly randomly get it to work or not, with the scales tipped towards usually not. extremely frustrating.

seems like it's largely due to running in a constrained environment. i have to figure out how to impose those limitations locally to see if i can break things i guess.

mrtristan commented 3 years ago

@pmdevers if you're able, can you see if that still works on a fresh docker build? i was fine until today when a new chrome version hit the stable channel (appreciate your help getting to this point). getting an immediate failure now. curious if it's me.

i pinned the installed version to 88.0.4324.182-1 (the last stable release from earlier last month) and now it's back to working.

i've also tracked the new version's issue down to being with the single-process flag.

Unhandled exception. PuppeteerSharp.ProcessException: Failed to create connection
 ---> System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server
 ---> System.Net.Http.HttpRequestException: Connection refused
 ---> System.Net.Sockets.SocketException (111): Connection refused
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.WebSockets.WebSocketHandle.ConnectAsyncCore(Uri uri, CancellationToken cancellationToken, ClientWebSocketOptions options)
   at System.Net.WebSockets.WebSocketHandle.ConnectAsyncCore(Uri uri, CancellationToken cancellationToken, ClientWebSocketOptions options)
   at System.Net.WebSockets.ClientWebSocket.ConnectAsyncCore(Uri uri, CancellationToken cancellationToken)
   at PuppeteerSharp.Transport.WebSocketTransport.CreateDefaultWebSocket(Uri url, IConnectionOptions options, CancellationToken cancellationToken)
   at PuppeteerSharp.Transport.WebSocketTransport.CreateDefaultTransport(Uri url, IConnectionOptions connectionOptions, CancellationToken cancellationToken)   
   at PuppeteerSharp.Connection.Create(String url, IConnectionOptions connectionOptions, ILoggerFactory loggerFactory, CancellationToken cancellationToken)    
   at PuppeteerSharp.Launcher.LaunchAsync(LaunchOptions options, Product product)
   --- End of inner exception stack trace ---
   at PuppeteerSharp.Launcher.LaunchAsync(LaunchOptions options, Product product)
   at PuppeteerSharp.Launcher.LaunchAsync(LaunchOptions options, Product product)
   at xxx.PuppeteerServiceTemp.ExecuteWithPageAsync[T](Func`2 callback, Nullable`1 timeoutMilliseconds) in /app/PuppeteerServiceTemp.cs:line 26      
   at xxx.Program.Main(String[] args) in /app/Program.cs:line 128
kblok commented 3 years ago

Closed due to inactivity. Feel free to reopen it if needed.

DHclly commented 1 year ago

still exist this case in new version in puppeteer 8