jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

Not able to crop the page, ValueError: bounding box has an area of zero #266

Closed zeina99 closed 4 years ago

zeina99 commented 4 years ago

What are you trying to do?

I am trying to crop a page to get the left half and the right half separately.

What code are you using to do it?

Paste it here, or attach a Python file.

file = pdfplumber.open("pdfs/language-models.pdf")

page = file.pages[1]

left = page.crop((0, 0, 0.5*float(page.width), 0) )

PDF file

Pdf file: language-models.pdf

Expected behavior

What did you expect the result should have been?

I expected it to crop the page to only the left half. As of my knowledge, the coordinates required are (x and y from top left, x and y from bottom right)

Actual behavior

I got an error saying the area is zero.

Traceback (most recent call last):
  File "/Users/***/Dev/DataScience internship/Week-7/week 7/extractpdf.py", line 21, in <module>
    left = page.crop((0, 0, 0.5*float(page.width), 0) )
  File "/Users/***/opt/anaconda3/envs/huggingface/lib/python3.7/site-packages/pdfplumber/page.py", line 240, in crop
    return CroppedPage(self, self.decimalize(bbox), relative=relative)
  File "/Users/***/opt/anaconda3/envs/huggingface/lib/python3.7/site-packages/pdfplumber/page.py", line 314, in __init__
    test_proposed_bbox(self.bbox, parent_page.bbox)
  File "/Users/***/opt/anaconda3/envs/huggingface/lib/python3.7/site-packages/pdfplumber/page.py", line 288, in test_proposed_bbox
    raise ValueError(f"Bounding box {bbox} has an area of zero.")
ValueError: Bounding box (Decimal('0'), Decimal('0'), Decimal('306.000'), Decimal('0')) has an area of zero.

Screenshots

The page I was trying to crop: image

Environment

Additional context

jsvine commented 4 years ago

Hi @zeina99, I think the problem is this: Your crop bounding-box (0, 0, 0.5*float(page.width), 0) — has a height of 0. To get the left half, you'll instead want: (0, 0, 0.5*float(page.width), page.height).

As of my knowledge, the coordinates required are (x and y from top left, x and y from bottom right)

Almost, but not quite. The coordinates required are:

zeina99 commented 4 years ago

Thanks, this clears it up!