microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.02k stars 233 forks source link

bad result on a test image #99

Open ywangwxd opened 1 year ago

ywangwxd commented 1 year ago

I made a simple fake document image.

test

I can get the parsing result but it's not good. I do not know if I use it in the right way. I noticed that in the paper it is mentioned that the previous approaches detect tabel cells only covering the texts in a cell. Now they correct this and make a table cell fit with the real boundary of a cell. But from the test result, it does not seem to be correct, even misalignment between the cell bboxs.

I also test baidu api online (document recovery), it also missed the first table. But it can parse the second table very well incuding the cell blank space.

test_fig_tables test_0_fig_cells

ywangwxd commented 1 year ago

Sorry, I made a mistake on the input image name. I can now get the parsing result but it's not good. I do not know if I use it in the right way. I noticed that in the paper it is mentioned that the previous approaches detect tabel cells only covering the texts in a cell. Now they correct this and make a table cell fit with the real boundary of a cell. But from the test result, it does not seem to be correct, even misalignment between the cell bboxs.

test_fig_tables test_0_fig_cells

bsmock commented 1 year ago

I believe the issue in this example is the padding that the original pre-trained model expects around the table.

If you're running the inference script from the command line, try adding this to the command and see if this improves the result: --crop_padding 30

If you're still unsure if you're running the code correctly, try the code on examples from PubTables-1M. If you don't have access to the full dataset, you can grab individual samples here for table structure recognition: https://www.kaggle.com/datasets/bsmock/pubtables-1m-structure

What you'll notice in the cropped tables in PubTables-1M is that they have around 30 pixels of padding around the table. The original model we released (currently the only model, but this will change in the future) expects there to be at least 20 pixels of padding around the table in the image. That's why you see the model ignoring the edges of the table in your example.

The padding issue is just because we had to make some choice for how to train the model in the original paper. We will release models that don't expect padding in the future.

I'll update the code to make 30 the default padding until we release models that don't expect 30 pixels of padding.

Cheers, Brandon

tuongtranngoc commented 1 year ago

I've faced the same issue. Please double-check I saw that this issue in recognize step

sanprit commented 1 year ago

@bsmock I am facing a similar issue on financial tables. is there any hack/change in argument/solution to fix this?

dangbuiii commented 11 months ago

I've tested on some pdf pages. I used hugging face's TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-detection") to detect tables. But the results were not as expected, detected tables always missed edges, header rows and some table can't be detected. How can I fix it? Here is one of my tested image. Screenshot 2023-07-21 110313

Best Regards, Dang

tuongtranegs commented 11 months ago

@dangbuiii You can divide your image into sub-images. After you run detect, this thing will return better result

NielsRogge commented 7 months ago

Hi,

Please try the updated notebook at #158 which includes padding when cropping the table