pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
518 stars 81 forks source link

Optionally embed images as base64 string #132

Closed jason-technology closed 2 months ago

jason-technology commented 2 months ago

I would like to request the option to include images in the markdown as base64 embedded images in their original locations.

JorjMcKie commented 2 months ago

Evaluating this. Markdown experts recommend not to do this because of terrible performance when rendering the file.

jason-technology commented 2 months ago

I have a valid need for this in building a multimodal RAG system. I can post process but it would be simple to implement natively.

Even if I later extract the images to a static blob location for non-multimodal, I still have need for the entire document to be intact as one file during intermediate steps.

I love the final results of pymupdf4llm. But I’m also having some trouble with images not getting extracted properly. So I’m having to use other tools.

On Thu, Sep 12, 2024 at 1:30 PM Jorj X. McKie @.***> wrote:

Evaluating this. Markdown experts recommend not to do this because of terrible performance when rendering the file.

— Reply to this email directly, view it on GitHub https://github.com/pymupdf/RAG/issues/132#issuecomment-2347182721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3CWJRXHBX3K2E6Y6MN453ZWH2VHAVCNFSM6AAAAABNZN45RCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXGE4DENZSGE . You are receiving this because you authored the thread.Message ID: @.***>

JorjMcKie commented 2 months ago

Fixed in version 0.0.15.