pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
539 stars 82 forks source link

Instead of file path in string would also like to pass fitz object #44

Closed narsandu closed 5 months ago

narsandu commented 5 months ago

Instead of file path in string would also like to pass fitz object to 'to_markdown' method can you update the input params to accept either file_path or fitz object

JorjMcKie commented 5 months ago

You can already do that today! So ~both~ all the following works exactly the same:

import pymupdf4llm
data = pymupdf4llm.to_markdown("input.pdf")

From Document:

import pymupdf4llm
import pymupdf

doc = pymupdf.open("input.pdf")
data = pymupdf4llm.to_markdown(doc)

Or a bytes, bytearray, io.BytesIO object:

import pymupdf4llm
import pymupdf
import pathlib

pdfdata = pathlib.Path("inut.pdf").read_bytes()  # make a memory-resident PDF
doc = pymupdf.open("pdf", pdfdata)  # open a memory-based PDF as a Document
data = pymupdf4llm.to_markdown(doc)
narsandu commented 5 months ago

thanks