sungaila / PDFtoImage

A .NET library to render PDF files into images.
https://www.sungaila.de/PDFtoImage/
MIT License
144 stars 14 forks source link

Render predefined area instead of the entire pdf page to receive a subset of the pdf page as an image #63

Closed ynnob closed 3 months ago

ynnob commented 5 months ago

Detailed feature request

At the moment all methods to save the pdf page as an image render the entire page using X:0 and Y:0 as their origin. If i only want a subset of the pdf as an image file i need to postprocess the image.

            using (MemoryStream pdfStream = new MemoryStream(resultPdfBytes))
            {
                SkiaSharp.SKBitmap bitmap = PDFtoImage.Conversion.ToImage(pdfStream);
                bitmap.ExtractSubset(bitmap, new SkiaSharp.SKRectI(0, 0, 1000, 1000));

                var data = bitmap.Encode(SkiaSharp.SKEncodedImageFormat.Png, 100);
                if(data is not null)
                {
                    resultPdfBytes = data.ToArray();
                }
            }

This is a time consuming process for very large pdf pages and not necessary if all methods would accept a Rect or x,y coordinates as their render origin + width / height. But it would probably be better to just pass a rectangle that defines the area the pdfium viewer should render.

Additionally rendering a very large pdf (for example large building plans) the rendering might fail. I have reproduced this multiple times by setting a dpi higher then 350 which in one case always fails to render. Rendering a subset should also solve this problem and allow rendering of areas with a higher dpi on a large pdf pages with a predefined rendering area.

I think this should be fairly easy to implement since both methods already expose left and top parameters which are set to 0 as default atm. (PDFtoImage.PdfiumViewer.PdfDocument.Render())

FPDFBitmap_FillRect()
RenderPDFPageToBitmap()

Greetings ynnob!

sungaila commented 5 months ago

Hi @ynnob, FPDF_RenderPageBitmap does not support clipping but FPDF_RenderPageBitmapWithMatrix looks like it does. I'm looking into it ...

sungaila commented 4 months ago

Hey @ynnob, I've released a preview version with support to render a subset of the PDF: PDFtoImage 4.0.0-preview

You can define the boundaries for the render with the Bounds parameter. The following example renders the bottom left quadrant of the PDF:

// get the size of the first PDF page
var pageSize = PDFtoImage.Conversion.GetPageSize(pdfStream, leaveOpen: true, page: 0);

// render the bottom left quadrant
PDFtoImage.Conversion.SavePng("output.png", pdfStream, page: 0, options: new(Dpi: 300, Bounds: new(0, pageSize.Height / 2f, pageSize.Width / 2f, pageSize.Height / 2f)));

Please note that the Bounds are relative to the size returned by GetPageSize and is independent to your Dpi, Width, Height and other settings. Rendering a subset of the PDF will not shrink the output image, you have to calculate the new size yourself.

You can test this new option here: https://www.sungaila.de/PDFtoImage/

Please let me know if this feature is working as expected so I can build a final release later on.

ynnob commented 4 months ago

Hey thanks for the update. Can you explain what you mean by

Rendering a subset of the PDF will not shrink the output image, you have to calculate the new size yourself.

How would you handle that? I attached an example pdf (width a rotation of 270°) and the resulting png if i render the following: Bounds: X: 50 Y: 250 Width: 250 Height: 250

The rendered area in the resulting png is stretched. sample_rotated_270 pdf sample_rotated_270.pdf

ynnob commented 4 months ago

Ohh i see. After adjusting the aspect ratio to 1:1 the image is no longer stretched. But tbh i think this should be the expected output if i declare a bound. The original size of the pdf should not impact the image size of the declared "rectangle" area.

The Width and Height should be the basis for the resolution of the rendered area: Width = Bounds.Width (DPI/72) Height = Bounds.Height (DPI/72)

Something like that? Please let me know what you generally think about that.

sample_rotated_270 pdf aspectratio_1_1

sungaila commented 4 months ago

Hi @ynnob, the size of the output bitmap and the bounds are strictly separated.

If your output bitmap is 100x100 in size but your bounds are 100x80, pdfium will stretch the 80 to 100. Also please note that rotation does not affect your bounds' X,Y coords and width/height.

Edit: I just tested this and DPI works as expected. But the interaction between width, height and bounds is odd. I'll look into it.

ynnob commented 4 months ago

Jeah the roatation is no problem. We already have similar solutions implemented in different languages and we use rotation adjusted target areas to render the expected areas.

Jeah something seems indeed off when setting a width and height.

Thanks for this already. Looks very promising.

sungaila commented 3 months ago

@ynnob PDFtoImage 4.0.1 fixed Bounds when used in conjunction with Width, Height and Rotation.

Please give it a try when you have time.

ynnob commented 3 months ago

Hey @sungaila I am testing again but i think something is still off. When setting the Width and Height the DPI is ignored. No matter what the dpi is set to the output does not change. When removing the Width/Height the dpi takes control again.

Is this supposed to happen? if so this should be documented and maybe you can explain what the individual settings really manipulate in the endresult.

Other than that this semms to correctly render the area. Nice 👍

sungaila commented 3 months ago

Hi @ynnob, this is intended behavior and is documented in the method XML comments (e.g. see IntelliSense in Visual Studio). You can either use DPI or set Width/Height, there is no combination of both like "use DPI for width but use this fixed height".

If I understood you correctly, the bounds work now correctly when using width/height/rotation. Please reopen this issue if there is still something not working. :-)