sungaila / PDFtoImage

A .NET library to render PDF files into images.
https://www.sungaila.de/PDFtoImage/
MIT License
144 stars 14 forks source link

PDFToImage.Conversion pdf file size limit? #70

Closed cdieu closed 3 months ago

cdieu commented 4 months ago

Question

Hello, We are wondering if there is a pdf file size limit for using the Conversion method. We are currently trying to convert a pdf base64string to an image using the Conversion.SaveJpeg method, but we are experiencing troubles when converting the pdf that we need to use. The file size of this is 500 kb. The program then ends up being stuck. However, when converting a smaller sized pdf with 50 kb or such, it works fine.

Is this something that is supposed to be handled in the method, or is it the case that we should find a work-around ourselves?

sungaila commented 4 months ago

Hi @cdieu,

there are no special memory or size limits put in place for this library. Base64 encoded PDFs will work up to around 1.5 GB in size (what you don't even come close to with 500 KB).

However, an uncompressed bitmap is temporarily used for rendering, and with a very high resolution, memory can run out quickly.

Example: you render your PDF at 100x100 pixels. That's $100 \times 100 \times 8 Bytes = 80.000 Byte = 0.08 Megabyte$

Render at 1.000x1.000 pixels and you will need 8 MB of memory.

For 10.000x10.000 pixels you will need 800 MB. My point being that you can run out of memory when setting width, height or dpi too high.

If you have a test PDF (without any personal information) to share, I can take a look into it.

cdieu commented 4 months ago

Hi, thanks for a quick answer! It might be a stupid mistake on our part, but thanks for wanting to check it out. We have tried setting the width, height, and dpi lower, but still no luck. We have also tried to use your web converter and it can convert and display as we want it to.

Is there any way for us to further investigate the issue by ourselves, as in, are there any exceptions triggered if something goes wrong?

Here is a test pdf. testpdf.pdf

For reference we only have this small part before calling the conversion: async Task LoadFiles(InputFileChangeEventArgs f) { var file = f.File; var buffers = new byte[file.Size]; await file.OpenReadStream(maxAllowedSize: 2000000L).ReadAsync(buffers); var base64string = Convert.ToBase64String(buffers); ...

sungaila commented 4 months ago

You don't have to convert your file into Base64, you can directly pass the stream like this:

void LoadFiles(InputFileChangeEventArgs f)
{
    var file = f.File;
    var stream = file.OpenReadStream(maxAllowedSize: 2000000L);
    PDFtoImage.Conversion.SaveJpeg("image.jpg", stream);

If that isn't working for you, then try to copy the stream into a MemoryStream first:

async Task LoadFiles(InputFileChangeEventArgs f)
{
    using var ms = new MemoryStream();
    await f.File.OpenReadStream(maxAllowedSize: 2000000L).CopyToAsync(ms);
    PDFtoImage.Conversion.SaveJpeg("image.jpg", ms);
sungaila commented 3 months ago

@cdieu Please reopen this issue if the code above isn't fixing your memory issue.