microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.31k stars 257 forks source link

Problem of inference of Table Structure when tables very close to image corners #21

Closed RobbyJS closed 2 years ago

RobbyJS commented 2 years ago

Hello, I have trained the Table Structure algorithm for 14 epochs and manage to obtain acceptable results on your images of test data. However, when I use the algorithm to perform inference on some table images of my own, I observe problems as the one below. This is a similar image as the one provided by your grits.py code, where all classes are plotted together: PMC5730189_table_0_no_white_w_box_cropped

I believe the problem is related with the distance of the table itself to the image borders. If I perform inference for the same table but keeping a larger distance table - image borders these are the results:

PMC5730189_table_0_w_box_cropped

The table border and all rows and columns are much better predicted. The image used for the examples is PMC5730189_table_0 from your dataset.

The same happens for many other tables. Moreover, I looked at the xml files with the class labels and bounding boxes data, and a large percentage of tables used for training (more than 95%) have a distance from the table border to image border of almost 40 pixels, for all borders (top, bottom, left & right).

So I was wondering how could the algorithm be made more robust for these cases, on which I need to predict the table structure and the table border is really close to the image border (less than 5-10 pixels). Should I change something on the training? Or something else?

Thanks in advance,

bsmock commented 2 years ago

Hi, yes this is expected behavior given the training data and data augmentation used in the paper.

However, we designed the data with robustness to padding around the table in mind. With small changes to the training code it should be easy to achieve whatever robustness you would like to padding around the true table border.

(As an aside, note that the reason we include padding in the training images is that if we had cropped all of the table images tightly to the table border then the opposite problem would be encountered: there would be no robustness to table images that have padding around the table.)

To achieve more robustness to different amounts of padding, the simplest change would be to increase the amount of cropping done during training here: https://github.com/microsoft/table-transformer/blob/3e1dd0c3cad7956c790765b491ec86817e94ce43/src/main.py#L75

You can also modify the transforms or make your own transform to achieve your desired result. The true border for every table is included in the labels so you can train a model with any amount of padding around the table you would like.

Hope that helps!

RobbyJS commented 2 years ago

Hello, Thanks for those clarifications.

I have made some tests modifying the RandomCrop values, and the problem seems to persist. I have only managed to complete a couple epochs however. Should I expect an improvement when training for more epochs or I should modify more lines of code than just these:

To achieve more robustness to different amounts of padding, the simplest change would be to increase the amount of cropping done during training here:

https://github.com/microsoft/table-transformer/blob/3e1dd0c3cad7956c790765b491ec86817e94ce43/src/main.py#L75

I have modified both the maximum crop and also the minimum crop inside the RandomCrop transform.

Thanks again,

bsmock commented 2 years ago

I'm guessing if you made the change recently in the training it will need more epochs to adapt. However, you might have more success using or adapting another transform we include in the code but don't use for the paper and still haven't documented: https://github.com/microsoft/table-transformer/blob/3e1dd0c3cad7956c790765b491ec86817e94ce43/src/table_datasets.py#L86

This one can be used to crop directly to the table bounding box by doing TightAnnotationCrop([0], 0, 0, 0, 0), or with some random padding around the table bounding box by doing TightAnnotationCrop([0], 10, 10, 10, 10).

Keep in mind that this goes beyond what we do in the paper so we aren't officially supporting/documenting it at this time. However I hope you find it helpful!

bsmock commented 2 years ago

I believe I've addressed your question, so I will close this issue.