Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.01k
stars
231
forks
source link
Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #183
Originally posted by **dsoft-jvo** June 21, 2024
I use this table-transformer code to extract the tables and table structures of invoices. Without adding the --words_dir argument, the result is very satisfactory. From my understanding, the words_dir is needed to add the contents of the found structures to the result, so I tried adding it. After adding one, however, the result is strange. The detected table gets shrunk to a small corner of the image and the table-structures all overlap each other. At first, this seemed like a scaling problem, but after fixing this, the problem persists.
Aside from the visual result, the 'tables_structure' output is also strange when a --words_dir is added. Without --words_dir the amount of rows and columns seems to be constant. When adding the --words_dir, however, the amount of rows and columns varies. Sometimes there are more, sometimes less. The tokens are formatted as described in the docs/INFERENCE.MD document.
I cannot show any actual data or images, as the data is sensitive, but this is what I found during debugging:
Without --words_dir, i.e. tokens=[]:
![image](https://github.com/microsoft/table-transformer/assets/169161646/97c60cdb-78f0-46a0-be80-e71de77bf7ee)
![image](https://github.com/microsoft/table-transformer/assets/169161646/4bc866a9-86f6-4953-826d-f3ab70af237c)
With a --words_dir, i.e. tokens=[...data...]:
![image](https://github.com/microsoft/table-transformer/assets/169161646/c9b83b9b-ca18-4a07-ad8d-debf362495e6)
I feel like the problem lies in a misunderstanding I have about the functions of the --words_dir data. I have read the papers, but I feel like I am missing something about that aspect.
Could someone give some further explanation about the use and function of --words_dir? Are the results I am seeing expected? Why, or why not? And if not, how do I go about fixing them?
Discussed in https://github.com/microsoft/table-transformer/discussions/182