xavctn / img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
MIT License
577 stars 76 forks source link

Merging cell values #172

Open eetap opened 9 months ago

eetap commented 9 months ago

Hi, thanks for the previous update, it helped a lot with table parsing. Unfortunately, I didn’t have enough competence to understand contours and сv2 :(

merged_val.pdf I encountered an error of incorrectly merging values into one whole in each cell. For example, for three tables with order numbers “23000Y***”, the first two rows were counted correctly and their values were not combined into one whole. And for orders 23000Y7138959 and 23000Y7138972, the values in the cells are combined into one, which is an incorrect definition. exmpl_1_tab.docx

There is an assumption that this may be due to a check for the number of lines when merging values in cells, because the first order “23000Y7138944” has two lines after it, while the others already have one line each.