python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.62k stars 1.13k forks source link

Tables extraction from DOCX #1120

Open GPrakruth opened 2 years ago

GPrakruth commented 2 years ago

Hi,

PDF files are converted to DOCX and then tables are extracted from DOCX. There are hidden columns and hidden text in the tables. Is there a way to ignore the hidden columns and text during conversion? Can the table structure be maintained during conversion of pdf to docx ignoring the hidden content and columns

Ayanami07 commented 3 months ago

Have you solved this problem?