pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.49k stars 443 forks source link

Get image inside table's cell #3586

Closed vinniec2 closed 2 weeks ago

vinniec2 commented 2 weeks ago

Hi, I am just now trying pymupdf, I want to extract the contents of some tables and it seems to work but in a column sometimes images appear and in this case the tab.extract() method of the Table class returns empty strings for these cells. The only thing I could think that I can do is to check the list of images and see if there is one that have it's imagebbox is inside the cellbbox. Is there a simpler solution? Thanks :)