hardkoded / puppeteer-sharp

Headless Chrome .NET API
https://www.puppeteersharp.com
MIT License
3.41k stars 443 forks source link

puppeteerSharp failed to display image in pdf #2650

Open kirticism opened 5 months ago

kirticism commented 5 months ago

Hi, I am encountering an issue while converting HTML to PDF using PuppeteerSharp. Specifically, the images are not displaying in the generated PDF. Despite following various solutions suggested on StackOverflow, the problem persists. Below is the method I am using to perform the HTML to PDF conversion. I would appreciate any guidance or solutions you could provide.

public async Task HtmlToPdf(string htmlContent, string fileName) { var startTime = DateTimeOffset.UtcNow; _logger.LogInformation("Executing GeneratePdf: {fileName}...", fileName);

var launchOptions = new LaunchOptions { Headless = true };
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultChromiumRevision);
using (var browser = await Puppeteer.LaunchAsync(launchOptions))
using (var page = await browser.NewPageAsync())
{
    await page.SetContentAsync(htmlContent);

    await page.AddStyleTagAsync(new AddTagOptions { Content = "body { font-size: 20px; margin: 50px; }" });
    await page.WaitForSelectorAsync("img");
    var pdfBytes = await page.PdfDataAsync(new PdfOptions { Format = PaperFormat.A4 });

    var filePath = Path.Combine(Path.GetTempPath(), fileName);
    File.WriteAllBytes(filePath, pdfBytes);

    var formFile = await CreateFormFile(filePath, fileName);

    File.Delete(filePath);

    var endTime = DateTimeOffset.UtcNow;
    var executionTime = endTime - startTime;
    _logger.LogInformation("Time taken to execute GeneratePdf: {fileName}. Time: {ExecutionTime} ms",
                           fileName, executionTime.TotalMilliseconds);

    return formFile;
}

}

mstijak commented 5 months ago
await page.WaitForSelectorAsync("img");

This is not enough. You should wait until the image loads too.

Perhaps this will do the trick.

await page.WaitForNetworkIdleAsync();
skalahonza commented 1 week ago

I am facing the same issue

        var pdfOptions = new PdfOptions
        {
            PrintBackground = true, // otherwise background of tables won't be printed
            Format = request.Options.Format.ToPuppeteerPaperFormat(),
            Landscape = request.Options.PrintOrientation == PrintOrientation.Landscape,
            MarginOptions =
            {
                Bottom = $"{request.Options.MarginOptions.Bottom}mm",
                Left = $"{request.Options.MarginOptions.Left}mm",
                Right = $"{request.Options.MarginOptions.Right}mm",
                Top = $"{request.Options.MarginOptions.Top + (request.Options.HeaderHeight != 0 ? HeaderContentGap + request.Options.HeaderHeight : 0)}mm"
            },
            Scale = new decimal(request.PrintScale),
            HeaderTemplate = headerTemplate,
            FooterTemplate = "<div></div>", // we want empty footer
            DisplayHeaderFooter = !string.IsNullOrEmpty(headerTemplate)
        };        
        await using var stream = await contentPage.PdfStreamAsync(pdfOptions);

to my surprise the images are seen in the html right before the PDF generation image

but this happens when the PDF is generated image

I tried many methods for waiting

    private async Task WaitForPageToBeLoaded(IPage page, CancellationToken ct)
    {
        _logger.LogDebug("Waiting for all fonts to be loaded");
        ct.ThrowIfCancellationRequested();
        await page.EvaluateExpressionHandleAsync("document.fonts.ready");
        _logger.LogDebug("Page fonts loaded");

        _logger.LogDebug("Emulating Screen media type");
        ct.ThrowIfCancellationRequested();
        await page.EmulateMediaTypeAsync(MediaType.Screen);
        _logger.LogDebug("Screen media type emulated");

        await page.SetJavaScriptEnabledAsync(true);

        _logger.LogDebug("Waiting for images to be loaded");
        ct.ThrowIfCancellationRequested();
        await page.WaitForSelectorAsync("img");
        _logger.LogDebug("Images loaded");

        await page.WaitForNetworkIdleAsync();
    }
kblok commented 1 week ago

@skalahonza what happens when you print to PDF in chrome?

skalahonza commented 1 week ago

@kblok that works well image

mstijak commented 1 week ago

This can happen if you're loading images over HTTPS while the page is served over HTTP.

skalahonza commented 1 week ago

As others suggested in other issues. I enabled ignoring of https errors for local development.

skalahonza commented 1 week ago

For those encountering similar issues, the problem was lazy loading. Images not initially visible on the page with lazy loading activated failed to render.

<img loading="lazy" src="data:image/png;base64,iVBORw0KGgoAA...">

Removing the lazy loading attribute from the image element resolved the issue, and PDF generation functioned properly once more.