hardkoded / puppeteer-sharp

Headless Chrome .NET API
https://www.puppeteersharp.com
MIT License
3.35k stars 440 forks source link

browser.NewPageAsync() dies in IIS #736

Closed taz4270 closed 5 years ago

taz4270 commented 5 years ago

browser.NewPageAsync() dies in IIS same problem as #375

Hello again, today I found a issue that was asked before on #375, the browser.NewPageAsync is not returning anything, don't know if there is an argument missing (probably not), or there was something that wasn't right at the master code, so i tried to find a solution, i've kinda made it work but unfortunatly, i can't find how can i open the other ISS' port (something like that but this part doesn't matter, just know that it worked), and when i tried to migrate the code the other side, it blow up

There is some part of my code

Remember, Im using my addon, so any method missing Async, is my doing

private Browser GetBrowser()
{
        loggerFactory = new LoggerFactory();
        return PuppeteerEx.Launch(new LaunchOptions()
        {
            Headless = true,
            ExecutablePath = _ExecutionPath_, //It goes to the HostingEnvirounment
            LogProcess = true,
        },
        loggerFactory;
        );
        private Page GetNavigatedPage(Browser browser)
        {
            Page page = browser.NewPage();
            page.GoTo(Link, 30000);
            return page;
        }
        catch (Exception e)
        {
            Debug.WriteLine(e);
            throw e.GetBaseException();
        }

       private Page GetPage()
       { 
            var browser = GetBrowser();
            var Navigate = GetNavigatedPage(browser);
            return Navigate;
       }
}
taz4270 commented 5 years ago

There is something inside of Browser.CreatePageInContextAsync or a property missing, that doesn't allow the program to proceed

taz4270 commented 5 years ago

Something is not completly right

Because I've been searching your code inside-out and even tried to find a result, I tried to make a method to find how deep i could go to find more precise as I could:

private static async Task<Page> SepareteContext(Browser browser){
        var browserContextArray = browser.BrowserContexts();
        var browserContext = browserContextArray.LastOrDefault();
        if (browserContext is null)
            return null;
        var page = await browserContext.NewPageAsync(); //Even tried with .ConfigureAwait(false)
        return page;
}

The bug occurrs when page is awaiting for browserContext.NewPageAsync(), it never has the anwser back PS: this is inside NewPage Now

kblok commented 5 years ago

What is NewPage doing?

taz4270 commented 5 years ago

NewPage is like NewPageAsync but synchronous

kblok commented 5 years ago

@taz4270 how are you making that synchronous?

taz4270 commented 5 years ago

Sort of, it's a "weird specie of a bird", That is synchronous but it waits for Tasks to be executed

kblok commented 5 years ago

I bet you're are locking the thread preventing Puppeteer from getting messages from chromium, you should await all the way.

taz4270 commented 5 years ago

I'm not sure about that, the Task is the responsable to get the message, besides, i already did it once, and even got the anwser, but now, i migrated to my main testbench and it blew up, and I can't even access the other IIS anymore, it doesnt allow me xD I can't do full async because most of my program doesn't support it

kblok commented 5 years ago

That's why I'm asking how are you waiting for the task. If you do browser.NewPage().Result it'll just not going to work.

taz4270 commented 5 years ago

of course not. but Task.Run(()=>browser.NewPageAsync()).Result do, but not on ISS, trust me, i did it on windows forms but i cant do it on IIS xD

kblok commented 5 years ago

You should never use .Result There are many documents talking about this https://msdn.microsoft.com/en-us/magazine/jj991977.aspx

You can wrap it inside a Task.Run but you should wrap your entire functionality not just pieces. Puppeteer is async by definition there is no way to change that.

taz4270 commented 5 years ago

I will try, but how can you explain me is it working on windows forms, and all other methods work fine as a line?

kblok commented 5 years ago

@taz4270 ASP.NET and Winforms don't share the same synchronization context. This is a good reading https://msdn.microsoft.com/en-us/magazine/gg598924.aspx

taz4270 commented 5 years ago

Ok, thanks, now i have to make everything from the ground i think 👍

taz4270 commented 5 years ago

Nop i was right, its was not my fault, i did everything, in IIS, async, but still locks there. At the same point as before Page page = await browser.NewPageAsync();

kblok commented 5 years ago

@taz4270 can you share some testable code?

taz4270 commented 5 years ago

I will try tomorrow, but today i cant

taz4270 commented 5 years ago

So, as you know there is an issue in browser.NewPageAsync and here is the context

HeadlessBrowser Core, Framework 4.7.1

public class HeadlessBrowser
{
    private static readonly string HostPath = HostingEnvironment.MapPath("~/App_Data/");

    private string ExecutionPath { get; set; }

    private Page CurrentPage { get; set; }

    public string Link { get; set; }

    public HeadlessBrowser(string link)
    {
        Link = link;
    }

    public async Task Setup()
    {
        ExecutionPath = await GetExecutionPathAsync();
        CurrentPage = await GetPageAsync();
    }

    private static BrowserFetcher GetBrowserFetcher() => new BrowserFetcher(new BrowserFetcherOptions()
    {
        Path = HostPath
    });

    private static async Task<string> GetExecutionPathAsync()
    {
        BrowserFetcher browserFetcher = GetBrowserFetcher();
        string ExecutionPath = browserFetcher.GetExecutablePath(BrowserFetcher.DefaultRevision);
        if (!File.Exists(ExecutionPath))
            await browserFetcher.DownloadAsync(BrowserFetcher.DefaultRevision);
        return ExecutionPath; 
    }

    private async Task<Browser> GetBrowserAsync => await Puppeteer.LaunchAsync(new LaunchOptions()
    {
        Headless = true,
        ExecutablePath = ExecutionPath
    });

    private async Task<Page> GetPageAsync()
    {
        return await GetNavigatedPageAsync(await GetBrowserAsync());
    }

    private async Task<Page> GetNavigatedPageAsync(Browser browser)
    {
        try
        {
            Page page = await browser.NewPageAsync(); //Turns into a vegetable
            await page.GoToAsync(Link,30000);
        }
        catch (Exception e)
        {
            //It never comes to this part
            Debug.WriteLine(e);
            throw e.GetBaseException();
        }
    }

    public async Task<Stream> GetScreenshotStreamAsync() => await CurrentPage.ScreenshotStreamAsync();
    //Thats all you need to know about this class
}

Now the WebApi

Thumbnail, WebAPI

[RoutePrefix("api")]
public class ThumbnailController : BaseApiController (Descendent of ApiController)
{
    [HttpGet,Route("thumb/v1")]
    public async Task<IHttpActionResult> ThumbnailAsync(string address)
    {
        HeadlessBrowser browser = new HeadlessBrowser(address);
        await browser.Setup();
        Stream image = await browser.GetScreenshotStreamAsync();
        return Ok(image != null? "ok" : "not ok");
    }
}
taz4270 commented 5 years ago

BrowserContext

https://github.com/kblok/puppeteer-sharp/blob/master/lib/PuppeteerSharp/BrowserContext.cs Line 74 - public Task<Page> NewPageAsync() => Browser.CreatePageInContextAsync(_id); Was this supposed to not have an await and async?

kblok commented 5 years ago

@taz4270 we use Task cascading a lot. Instead of awaiting the task we just return it, until someone really needs to await it. This is a good read https://weblogs.asp.net/sfeldman/task-vs-async-task

taz4270 commented 5 years ago

That's the awnser for last question. What about the code above?

taz4270 commented 5 years ago

Even that, in this situation, it needs to return the page or it will wait forever

kblok commented 5 years ago

@taz4270 I just created a project with your code and it worked.

willsbctm commented 5 years ago

@kblok . We have a dotnet core 2.1 application running on IIS 7.5 using puppeteer-sharp 1.9. The server is windows 2008 R2.

When is called await browser.NewPageAsync(), we received a timeout even using configureawait false.

Any suggestion for us?

Thanks.

kblok commented 5 years ago

@willsbctm I'll take a look running it on IIS.

kblok commented 5 years ago

@willsbctm @taz4270 reproduced guys.

taz4270 commented 5 years ago

Wait reproduced the bug or did you successfully executed? I got confused

kblok commented 5 years ago

@taz4270 I'm getting the same issue on IIS

taz4270 commented 5 years ago

@kblok any update about the bug? Whats the issue or what do you think it is?

kblok commented 5 years ago

@taz4270 we are getting a deadlock when we use the session for the very first time https://github.com/kblok/puppeteer-sharp/blob/master/lib/PuppeteerSharp/Page.cs#L1520

The thread is being stuck there, and it doesn't let puppeteer keep listening to the WebSocket to get the answer to that command. No joy yet.

lucascebertin commented 5 years ago

Guys, I found a similar issue for puppeteer. https://github.com/GoogleChrome/puppeteer/issues/1262

On this case the workaround was adding --no-sandbox on chromium, is this a "solution"?

kblok commented 5 years ago

@lucascebertin I don't think so. I read a little bit about this issue and the problem is that ASP.NET thread management. I can't confirm it yet, but it seems that ASP.NET is assigning only one thread per request and that deadlocks the execution because it doesn't allow the connection (which is using Task.Run to listen to new messages.

I bet I'll be working on this after I release 1.10.

kblok commented 5 years ago

@taz4270 @willsbctm could you try v1.10?

lucascebertin commented 5 years ago

I tried the v1.10, with --no-sandbox works, without it looks like the deadlock is still there.

Don't know if this log can help, but that's what I got:

`trce: PuppeteerSharp.Connection[0] Send  1 Method Target.setDiscoverTargets Params { discover = True }

trce: PuppeteerSharp.Connection[0] ? Receive {"method":"Target.targetCreated","params":{"targetInfo":{"targetId":"3276034a-b7f0-462b-af30-c26a3fe05f40","type":"browser","title":"","url":"","attached":false}}}

trce: PuppeteerSharp.Connection[0] ? Receive {"method":"Target.targetCreated","params":{"targetInfo":{"targetId":"5EACC14CD88992030B76F44B4729FDDD","type":"page","title":"","url":"about:blank","attached":false,"browserContextId":"6E425A78148B53E4686790E8F708EADA"}}}

trce: PuppeteerSharp.Connection[0] ? Receive {"method":"Target.targetCreated","params":{"targetInfo":{"targetId":"b97ca89b-6136-4bad-8de2-8aa0722e3820","type":"browser","title":"","url":"","attached":true}}}

trce: PuppeteerSharp.Connection[0] ? Receive {"id":1,"result":{}}

trce: PuppeteerSharp.Connection[0] Send  2 Method Target.createTarget Params [url, about:blank]

trce: PuppeteerSharp.Connection[0] ? Receive {"method":"Target.targetCreated","params":{"targetInfo":{"targetId":"F5E5D9799EEDBC8CE189DE8BE015AD9A","type":"page","title":"","url":"","attached":false,"browserContextId":"6E425A78148B53E4686790E8F708EADA"}}}

trce: PuppeteerSharp.Connection[0] ? Receive {"id":2,"result":{"targetId":"F5E5D9799EEDBC8CE189DE8BE015AD9A"}}

trce: PuppeteerSharp.Connection[0] ? Receive {"method":"Target.targetInfoChanged","params":{"targetInfo":{"targetId":"F5E5D9799EEDBC8CE189DE8BE015AD9A","type":"page","title":"","url":"about:blank","attached":false,"browserContextId":"6E425A78148B53E4686790E8F708EADA"}}}

trce: PuppeteerSharp.Connection[0] Send  3 Method Target.attachToTarget Params { targetId = F5E5D9799EEDBC8CE189DE8BE015AD9A }

trce: PuppeteerSharp.Connection[0] ? Receive {"method":"Target.targetCrashed","params":{"targetId":"F5E5D9799EEDBC8CE189DE8BE015AD9A","status":"failed to launch","errorCode":18}}

trce: PuppeteerSharp.Connection[0] ? Receive {"method":"Target.targetInfoChanged","params":{"targetInfo":{"targetId":"F5E5D9799EEDBC8CE189DE8BE015AD9A","type":"page","title":"","url":"about:blank","attached":true,"browserContextId":"6E425A78148B53E4686790E8F708EADA"}}}

trce: PuppeteerSharp.Connection[0] ? Receive {"method":"Target.attachedToTarget","params":{"sessionId":"87634875ECAD81B62CDC1B6CDACEA827","targetInfo":{"targetId":"F5E5D9799EEDBC8CE189DE8BE015AD9A","type":"page","title":"","url":"about:blank","attached":true,"browserContextId":"6E425A78148B53E4686790E8F708EADA"},"waitingForDebugger":false}}

trce: PuppeteerSharp.Connection[0] ? Receive {"id":3,"result":{"sessionId":"87634875ECAD81B62CDC1B6CDACEA827"}}

trce: PuppeteerSharp.CDPSession[0] Send  1 Method Page.enable Params (null)

trce: PuppeteerSharp.Connection[0] Send  4 Method Target.sendMessageToTarget Params [sessionId, 87634875ECAD81B62CDC1B6CDACEA827], [message, {"id":1,"method":"Page.enable","params":null}]

trce: PuppeteerSharp.Connection[0] ? Receive {"id":4,"result":{}}`

kblok commented 5 years ago

@lucascebertin any reason not to use --no-sandbox? For what I read in Puppeteer is something that users end up setting it when they run with restricted permissions.

lucascebertin commented 5 years ago

No, I think it's ok in our scenario set --no-sandbox, we are just creating some PDFs and inside an very controlled environment. I just thought a little weird the deadlock without the arg being passed but just because I got curious to really understand the problem haha

lucascebertin commented 5 years ago

Would be a good thing consider to add this magic argument on README?

kblok commented 5 years ago

Agreed @lucascebertin. I'm also working on a better connection transport for AspNet Full Framework. As it's not recommended to use Task.Run there.

lucascebertin commented 5 years ago

That's great. By the way, thanks for your help on it :blush:

taz4270 commented 5 years ago

Unfortunatly, i'm currently too busy to do the experience, maybe i have some time over the weekend, but until then, i dont think so

kblok commented 5 years ago

We just release a new package for ASP.NET Framework https://www.nuget.org/packages/PuppeteerSharp.AspNetFramework/

Here's one example https://github.com/kblok/puppeteer-sharp/blob/master/samples/PupppeterSharpAspNetFrameworkSample/PupppeterSharpAspNetFrameworkSample/Controllers/TestController.cs

Give it a try. If you have any issues let's open another issue.

sudheerkaku commented 5 years ago

@willsbctm did you figure out how to run it on windows server 2008 R2/Net Core ? @kblok whats the fix thats added in PuppeteerSharp.AspNetFramework? this doesn't work with net core, need a fix for netcore application.

kblok commented 5 years ago

@sudheerkaku some reading here https://github.com/kblok/puppeteer-sharp/pull/776

Basically, we shouldn't use Task.Run on ASP.NET. But I can't say that this is THE solution.

sudheerkaku commented 5 years ago

@kblok I'm not using Task.run public static void Main(string[] args) { SavePDFUsingPuppeteerSharp(filePath, pdfFilePath).Wait(); } public static async Task SavePDFUsingPuppeteerSharp(string htmlFilePath, string pdfFilePath) { await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision); var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true }); var page = await browser.NewPageAsync(); PdfOptions pdfOptions = new PdfOptions(); pdfOptions.PreferCSSPageSize = true; await page.GoToAsync(htmlFilePath, WaitUntilNavigation.Load); await page.PdfAsync(pdfFilePath, pdfOptions); }

kblok commented 5 years ago

@sudheerkaku but the Puppeteer-Sharp connection does. PuppeteerSharp.AspNetFramework uses HostingEnvironment.QueueBackgroundWorkItem

sudheerkaku commented 5 years ago

ah got it, PuppeteerSharp.AspNetFramework is for .netframework unfortunately i'm using .netcore.. so should I use the code and make the changes?

kblok commented 5 years ago

Sorry @sudheerkaku I missed that part. I'm locking the conversation on this issue. If you have any issues feel free to create a new one with the steps to reproduce it.