Table Reconstruction - Githubissues

skwskwskwskw commented 1 year ago

I am trying the table structure reconstruction pre-trained model. I have correct number of rows and columns detected, but the splitting is less optimal. Not sure what could be the issue and how to improve it?

Here's the sample of header:

Here's the sample of splits:

bsmock commented 1 year ago

Hi, sometimes the padding around the table can affect the pre-trained model we released. But in that case only the edge rows and edge columns are usually affected. That's fixed by adding more padding around the table.

In your case, based on what I'm seeing, you'll probably need to fine-tune the model on a small number of cases like the one here, if your cases are all visually similar to this one. The pre-trained model has seen many table layouts but hasn't seen many examples that look like this one visually.

Some options you have are: 1) Training with additional data augmentation for PubTables-1M to make it generalize better to your cases 2) Fine-tuning the pre-trained model with FinTabNet using the scripts in this repo 3) Labeling your own small dataset and fine-tuning the model

Best, Brandon

skwskwskwskw commented 1 year ago

Hi,

Is there any form of desired padding or resizing needed before doing TSR with the model?

Thanks

microsoft / table-transformer

Table Reconstruction #106