sungaila / PDFtoImage

A .NET library to render PDF files into images.
https://www.sungaila.de/PDFtoImage/
MIT License
144 stars 14 forks source link

Text missing when render pdf into image #55

Closed scegg closed 4 months ago

scegg commented 6 months ago

PDFtoImage version

3.0.0

OS

Windows

OS version

No response

Architecture

x64

Framework

.NET (Core)

App framework

No response

Detailed bug report

using (var pdfStream = File.OpenRead(pdfFile))
            {
                PDFtoImage.Conversion.SavePng("D:\\output.png", pdfStream);
            }

1.pdf When processing the pdf attached, all texts are missing in the lower part of the image created. Png file is too big to upload.

scegg commented 6 months ago

The lower part of PNG:

image

sungaila commented 6 months ago

Hi @scegg, I can confirm the lower part of the PDF is not rendered correctly. I'll be looking into this issue.

sungaila commented 6 months ago

It looks like a memory issue. Your PDF is rendered at 300 DPI (default values) and is 4,000 * 46,812 in size.

That makes (4,000 46,812) px 4 bytes/px = 748,992,000 bytes ≈ 749 megabytes.

If you lower the DPI the text renders properly again:

using (var pdfStream = File.OpenRead(pdfFile))
{
    PDFtoImage.Conversion.SavePng("D:\\output.png", pdfStream, dpi: 200);
}

image image

I'll check if there are options in PDFium or SkiaSharp to circumvent this memory issue.

scegg commented 6 months ago

Thanks for your testing. Waiting for further result.

There is no memory related, actually none at all, exception thrown. In this situation, the code should throw some instead of generating a corrupted file.

sungaila commented 6 months ago

I agree with you that an Exception should be thrown instead of generating corrupted images. However, this library is a wrapper of SkiaSharp and PDFium so there is little I can do about this.

For testing I replaced SkiaSharp with System.Common.Drawing (GDI) but the font issue is still there. This confirms that the issue must be related to PDFium.

Right now I am checking if there is any workaround or setting to fix this issue in PDFium.

sungaila commented 4 months ago

Unfortunately, I haven't found a way to work around this issue or detecting it in the first place.

sungaila commented 4 months ago

@scegg Good news: I did find a workaround after all! You can use the optional parameter UseTiling to render your sample PDF properly (text isn't missing anymore).

using (var pdfStream = File.OpenRead(pdfFile))
{
    PDFtoImage.Conversion.SavePng("D:\\output.png", pdfStream, options: new(UseTiling: true));
}

With this parameter, the PDF is rendered in several parts and combined into one complete image at the end. This circumvents the issue of PDFium failing at very high resolutions.