Optionally embed images as base64 string

jason-technology commented 2 months ago

I would like to request the option to include images in the markdown as base64 embedded images in their original locations.

JorjMcKie commented 2 months ago

Evaluating this. Markdown experts recommend not to do this because of terrible performance when rendering the file.

jason-technology commented 2 months ago

I have a valid need for this in building a multimodal RAG system. I can post process but it would be simple to implement natively.

Even if I later extract the images to a static blob location for non-multimodal, I still have need for the entire document to be intact as one file during intermediate steps.

I love the final results of pymupdf4llm. But I’m also having some trouble with images not getting extracted properly. So I’m having to use other tools.

On Thu, Sep 12, 2024 at 1:30 PM Jorj X. McKie @.***> wrote:

Evaluating this. Markdown experts recommend not to do this because of terrible performance when rendering the file.

— Reply to this email directly, view it on GitHub https://github.com/pymupdf/RAG/issues/132#issuecomment-2347182721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3CWJRXHBX3K2E6Y6MN453ZWH2VHAVCNFSM6AAAAABNZN45RCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXGE4DENZSGE . You are receiving this because you authored the thread.Message ID: @.***>

JorjMcKie commented 2 months ago

Fixed in version 0.0.15.

pymupdf / RAG

Optionally embed images as base64 string #132