pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
518 stars 81 forks source link

Superscript texts are not handled properly within tables #160

Closed argocan closed 1 month ago

argocan commented 1 month ago

When superscript text (e.g. footnote references) is present inside tables, the text is not handled correctly, or at least differently than when it is outside a table. The attached PDF

document_with_notes.pdf

is translated into markdown in the following way:

`This is a note reference[1] inside normal text.

Table cell Table cell
This is a note reference inside a table2 Table cell

1 Reference outside table

2 Reference inside table

-----`

I used version 0.0.16 because 0.0.17 does not seem to handle tables correctly. This is a result of the conversion with 0.0.17:

`This is a note reference[1] inside normal text.

Table cell

This is a note reference inside a table[2]

1 Reference outside table

2 Reference inside table

Table cell Table cell
This is a note reference inside a table2 Table cell

-----`

JorjMcKie commented 1 month ago

This is not a bug but simply an unsupported feature.

argocan commented 1 month ago

Ok... it would be helpful indeed :) thanks for the response

JorjMcKie commented 1 month ago

Sorry about that ... 😒