Closed jamiejcole closed 1 year ago
When trying to create my own instance of a Page
, I just get the error AttributeError: 'FilteredPage' object has no attribute 'attrs'
clean_text = PAGE.filter(lambda obj: obj["object_type"] == "char" and "Bold" not in obj["fontname"])
filename = Path(f"./path-to-pdf.pdf")
import pdfplumber.page
x = pdfplumber.page.Page(filename, clean_text, 3) # 3 is the page number that this image is on
croppedImage = x.crop(coords)
Thanks for filing this, @jamiejcole. You've identified a point I should clarify in the documentation: .to_image(...)
does not and (unfortunately, with the current architecture) cannot take .filter(...)
-introduced changes into account. That's because the .to_image(...)
method just hands off the PDF and page number to Wand
for rendering, which pdfplumber
then crops if necessary (e.g., if it's a CroppedPage).
I've now added a note to the README.md file: https://github.com/jsvine/pdfplumber/commit/dbaf0cce5332475f7cd259cdc13777e6b943b20e
Describe the bug
Trying to
.to_image()
a FilteredPage object, as I don't want to include bold text within the image export. However, the bold 1 remains in the image.Code to reproduce the problem
PDF file
Relevant page: SDDPDF.pdf
Expected behavior
The
clean_text
FilteredPage
should remove the bold 1 as seen in the screenshot of the PDF.Actual behavior
The 1 remains within the
.to_image()
export.