microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.01k stars 231 forks source link

Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #183

Open dsoft-jvo opened 6 days ago

dsoft-jvo commented 6 days ago

Discussed in https://github.com/microsoft/table-transformer/discussions/182

Originally posted by **dsoft-jvo** June 21, 2024 I use this table-transformer code to extract the tables and table structures of invoices. Without adding the --words_dir argument, the result is very satisfactory. From my understanding, the words_dir is needed to add the contents of the found structures to the result, so I tried adding it. After adding one, however, the result is strange. The detected table gets shrunk to a small corner of the image and the table-structures all overlap each other. At first, this seemed like a scaling problem, but after fixing this, the problem persists. Aside from the visual result, the 'tables_structure' output is also strange when a --words_dir is added. Without --words_dir the amount of rows and columns seems to be constant. When adding the --words_dir, however, the amount of rows and columns varies. Sometimes there are more, sometimes less. The tokens are formatted as described in the docs/INFERENCE.MD document. I cannot show any actual data or images, as the data is sensitive, but this is what I found during debugging: Without --words_dir, i.e. tokens=[]: ![image](https://github.com/microsoft/table-transformer/assets/169161646/97c60cdb-78f0-46a0-be80-e71de77bf7ee) ![image](https://github.com/microsoft/table-transformer/assets/169161646/4bc866a9-86f6-4953-826d-f3ab70af237c) With a --words_dir, i.e. tokens=[...data...]: ![image](https://github.com/microsoft/table-transformer/assets/169161646/c9b83b9b-ca18-4a07-ad8d-debf362495e6) I feel like the problem lies in a misunderstanding I have about the functions of the --words_dir data. I have read the papers, but I feel like I am missing something about that aspect. Could someone give some further explanation about the use and function of --words_dir? Are the results I am seeing expected? Why, or why not? And if not, how do I go about fixing them?