Suggestion: possibility to have a callback on table

pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF

GNU Affero General Public License v3.0

593 stars 92 forks source link

Table identification as such is a PyMuPDF Page method. Complex table situations that would be represented via colspan etc. in HTML are not representable in that way. Instead you must interpret text and bbox cells having value None accordingly. We are not planning to change that. We are using table.to_markdown() in this repo to output tables in text format. This inevitably exerts restrictions on what can be expressed at all. The only conceivable alternative may be outputting tables instead (or additionally) as pandas DataFrames - as an option only. Dataframes will reflect the None values in cells for deriving complex table structures.

pymupdf / RAG

Suggestion: possibility to have a callback on table #33