Open bentsi opened 1 year ago
Hi @bentsi, thanks for sharing this example. The height
, top
, and bottom
attributes are all calculated from the raw annotation's Rect
(bounding box), specified by the PDF in a direct command.
In this particular PDF (as observed by opening it in a text editor), that Rect command is Rect[428.053 634.536 453.041 626.144]
, which corresponds to exactly what you see for x0
, y0
, x1
, y1
in your screenshot above, suggesting that pdfplumber
is collecting the correct information.
Given that, there would seem to be two main options:
Do nothing, on the principle that pdfplumber
should focus on PDF objects' actual (i.e., as coded) attributes, rather than what we think the author intended.
When pdfplumber
sees an annotation that uses a bounding box that suggests a negative height, "fix" the bounding box (probably by flipping the vertical coordinates) so that it has a positive height.
My inclination is toward the first option, because trying to fix PDF-creator's mistakes seems like opening a can of worms. But I'm open to suggestions otherwise.
Describe the bug
hyperlink height property has negative height value.
Code to reproduce the problem
1) open pdf 2) see pdf_file.pages[61].hyperlinks
PDF file
https://www.singtel.com/content/dam/singtel/about-us/sustainability/reports/Singtel-Group-Sustainability-Report-2022.pdf
Expected behavior
height should be positive number
Actual behavior
height has negative value
Screenshots
Environment
Additional context
in addition we can see that "top" and "bottom" attributes are swapped, that doesn't comply with pdfplumber's bounding box definitions as discussed in https://github.com/jsvine/pdfplumber/issues/198