pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://jorjmckie.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
142 stars 31 forks source link

Images in table #21

Open chillyoung4679 opened 1 month ago

chillyoung4679 commented 1 month ago

Hello,

My PDF file contains long tables, and the tables include images. I tried

md_text = pymupdf4llm.to_markdown("input.pdf", write_images=True)

and the result was that the images were extracted but placed below the table.

| Col1  | Col2  | Image |
|---|---|---|
| Text | Text |  |
| Text | Text |  |
| Text | Text |  |

![image1](images1.png)
![image2](images2.png)
![image3](images3.png)

However, I want the images to be inside the table. like:

| Col1  | Col2  | Image |
|---|---|---|
| Text | Text | ![image1](images1.png) |
| Text | Text | ![image2](images2.png) |
| Text | Text | ![image3](images3.png) |

How can I achieve this?

Best

JorjMcKie commented 1 month ago

No, this is currently not supported. But let me check if there is any chance to tweak the table finder.