Error when loading pdf using python BytesIO: object has no attribute 'page_count'

pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF

GNU Affero General Public License v3.0

539 stars 82 forks source link

Python 3.10.14 pymupdf4llm version 0.5

Trying to read a pdf from an S3 bucket (file_content in the code below) and run pymupdf4llm later, but got an error. I used BytesIO object. This works fine when loading pdf from local disk (i.e. without BytesIO)

try:
        file_content = s3.get_object(Bucket=XXXXX, Key=XXXX)['Body'].read()        
except Exception as e:
        print(e)
        print(f"""Error getting object {XXXX} from bucket {XXXX}. Make sure they exist and your bucket is in the same region as this function."""
        raise e

md_file = pymupdf4llm.to_markdown(BytesIO(file_content)) 

# AttributeError: '_io.BytesIO' object has no attribute 'page_count'

pymupdf / RAG

Error when loading pdf using python BytesIO: object has no attribute 'page_count' #38