microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.01k stars 231 forks source link

Unable to recreate Fintabnet TATR Model Performance #149

Closed nandha1nks closed 7 months ago

nandha1nks commented 9 months ago

I attempted in recreating the TATR model on FinTabNet data refering to the Aligning benchmark datasets for TSR

I used the same code as suggested in the paper to create FinTabNet.a6 data. I kept the number of epochs as 26 as mentioned in the paper, and used structure_config.json as the config file during training.

But the results I got after training the model from scratch and the published model are not similar.

Can you please help me understand if I am missing something? Sharing the results for comparison.

Fintabnet TATR Results

Retrained Model Results

karthikgali commented 9 months ago

@bsmock Could you please look into it.

MohamedDonia commented 7 months ago

@nandha1nks I have the same issue here, I trained structure model from scratch with default configurations and got lower results than the published one, @bsmock is there any change in the default structure configuration file?

bsmock commented 7 months ago

TLDR: you likely need to train the model for longer.

In that paper specifically we mention that we standardize one epoch to mean 720,000 training samples. This is so an epoch has the same meaning whether we are training with PubTables-1M or FinTabNet.

The version of FinTabNet we train with, called FinTabNet.c, has only 78,536 training samples. What we want is to drop the learning rate by 0.9 every epoch = 720,000 samples, not every 78,536 samples. I believe we had to create a custom version of the code to do this. But for simplicity you can come very close to doing this with the current code by setting the “lr_drop” parameter to 9 when you train. This will drop the learning rate every 9 78,536 = 706,824 samples. Then you want to train for 9 30 = 270 epochs, because an epoch according to the training code is 78,536 samples for this dataset. I recommend also setting “checkpoint_freq” to 9 so you have 30 model checkpoints instead of 270 model checkpoints.

If you want to do exactly what we did in the paper, you might have to change the training code a little bit. But we released the model anyway for reproducibility.

Hope that helps!

nandha1nks commented 7 months ago

Thanks for clarifying that, @bsmock. Will try and reach out to you in case of other issue.