Closed mlove4u closed 8 months ago
Our crop
option of PdfPage.render()
is not meant as a PDF box with absolute coordinates (as would be returned by get_cropbox()
), but merely as the amount to cut off from the default rendering.
e.g. suppose you have a page of size 0, 0, 120, 120
but you want to render only the 10, 10, 110, 110
area, then we need to calculate the diff between the two boxes, which would be 10 on each side, i.e. crop=(10, 10, 10, 10)
.
I'm pretty sure the -crop[0], -crop[3]
is correct: We shift the top left point out of the canvas, so the part to crop is simply drawn into void, as a way of thinking. (Note that bitmap origin is at top left, as opposed to bottom left for the PDF coordinate system.)
This approach was confirmed by pdfium team in https://crbug.com/pdfium/2034#c11 f.
Though, as an afterthought, I guess it might have been better to design the API with coordinates rather than difference, which seems sort of more generic and could help avoid an unnecessary layer of calculation.
But just keep in mind that PDFs/bitmaps use a different coordinate system, so you still couldn't use a PDF box as-is for rendering crop. FPDF_PageToDevice()
/ FPDF_DeviceToPage()
might help with this (I wrote a wrapper at some point, but it's in the stalled dev branch, unfortunately...)
An alternative approach could be to temporarily alter the cropbox via set_cropbox()
for rendering, and change back afterwards – then you don't need any complicated conversion.
Corrected example with set_cropbox()
approach:
import pypdfium2 as pdfium
def test_pypdfium2(pdf_path):
pdf = pdfium.PdfDocument(pdf_path)
page = pdf[0]
print(page.get_size())
boxes = {
# box: left, bottom, right, top
"trimbox": page.get_trimbox(),
"bleedbox": page.get_bleedbox(),
"cropbox": page.get_cropbox(),
"mediabox": page.get_mediabox(),
"mybox": (10, 400-110, 110, 400-10), # pad, page_h-(area_h+pad), area_w+pad, page_h-pad
}
for box, crop in boxes.items():
page.set_cropbox(*crop)
bitmap = page.render(scale=1)
pil_image = bitmap.to_pil()
pil_image.save(f"out/{box}.png")
page.set_cropbox(*boxes["cropbox"])
pdf.close()
pdf_path = "boxes_test.pdf"
test_pypdfium2(pdf_path)
trimbox
bleedbox
cropbox
mediabox
mybox
Thank you very much.
page.set_cropbox(*crop)
works perfectly.
Yesterday, I did not give sufficient consideration to the PDF coordinates. Moreover, the PDF sample I made was a standard PDF (with all four bleed values being the same), so I did not notice any issues. Today, I specifically created a PDF with different bleed values and discovered that the code changes I made yesterday were actually incorrect.
Thanks for your work.
boxes_test_new.pdf
Though, as an afterthought, I guess it might have been better to design the API with coordinates rather than difference, which seems sort of more generic and could help avoid an unnecessary layer of calculation.
But just keep in mind that PDFs/bitmaps use a different coordinate system, so you still couldn't use a PDF box as-is for rendering crop.
FPDF_PageToDevice()
/FPDF_DeviceToPage()
might help with this (I wrote a wrapper at some point, but it's in the stalled dev branch, unfortunately...)
I think so too (that's why I got it wrong yesterday). Most of the time, I use PyMuPDF (due to licensing reasons, sometimes I use pypdfium2). In PyMuPDF, it's simply a matter of using
page.get_pixmap(clip=page.trimbox)
and that's it.
Yeah... On the other hand, pdfium's API allows for either cropping with device coordinates, or with page coordinates by altering the cropbox. So actually that design offers both approaches, without the caller having to translate. I agree it makes things a bit less obvious here, but there may be other use cases where device coordinates are more straightforward, say, GUI-based cropping.
Checklist
pypdfium2
fromPyPI
orGitHub/pypdfium2-team
.Description
Hi. Today, I tried to use the page.render method to convert a PDF file (only the trimbox range) and encountered the following error:
..._helpers/page.py", line 416, in render raise ValueError("Crop exceeds page dimensions")
Upon checking the source code, I found the code for calculating bitmap dimensions: https://github.com/pypdfium2-team/pypdfium2/blob/8f6ecb0d51c79ee827062c9b6b9d165a13afe481/src/pypdfium2/_helpers/page.py#L413-L414 Here, crop returns (left.value, bottom.value, right.value, top.value), so the correct calculation should be (I think):However, even after making these modifications, page.render still returns an unexpected range. After reviewing the code comments of PDFium's FPDF_RenderPageBitmap function, I discovered that: https://github.com/pypdfium2-team/pypdfium2/blob/8f6ecb0d51c79ee827062c9b6b9d165a13afe481/src/pypdfium2/_helpers/page.py#L425
-crop[3]
seems incorrect; the correct one should be-crop[1]
. After these three modifications, the rendering results are consistent with actual conditions. Below is my code and the test PDF file. Thanks.Install Info
Validity