Closed papipsycho closed 5 months ago
Table identification as such is a PyMuPDF Page
method.
Complex table situations that would be represented via colspan etc. in HTML are not representable in that way.
Instead you must interpret text and bbox cells having value None
accordingly. We are not planning to change that.
We are using table.to_markdown()
in this repo to output tables in text format. This inevitably exerts restrictions on what can be expressed at all.
The only conceivable alternative may be outputting tables instead (or additionally) as pandas DataFrames
- as an option only. Dataframes will reflect the None
values in cells for deriving complex table structures.
Hello,
Sometimes the table are really complex to manage, especially with colspan or rowspan so, i suggest having a possibility to have an event or callback to being able to change the way it writes the table,
@JorjMcKie let me know if you are interested in this, I can help with the integration of this feature