jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.1k stars 625 forks source link

AttributeError: 'bytes' object has no attribute 'seek' #833

Closed jiarongkoh closed 1 year ago

jiarongkoh commented 1 year ago

Encountered the above error when I attempted to pass the pdf file as bytes to the pdfplumber.open() function.

Code:

import requests
import base64
import pdfplumber

url = "https://www1.bca.gov.sg/docs/default-source/docs-corp-news-and-publications/circulars/industry-circular_revised-pea-code_1-march-2023.pdf"
data = base64.b64encode(requests.get(url).content)

with pdfplumber.open(data) as pdf:
    text = ""
    for page in pdf.pages:
        text += page.extract_text()

    print(text)

Error:

AttributeError                            Traceback (most recent call last)
Cell In[29], line 3
      1 import pdfplumber
----> 3 with pdfplumber.open(data) as pdf:
      4     text = ""
      5     for page in pdf.pages:

AttributeError: 'bytes' object has no attribute 'seek'

I'm using:

samkit-jain commented 1 year ago

Hi @jiarongkoh Appreciate your interest in the library. The issue is happening because you are encoding the data. You should use io.BytesIO instead. The code will become

import io
data = io.BytesIO(requests.get(url).content)

instead of

data = base64.b64encode(requests.get(url).content)