Finetuning Dataset Annotation Format

bely66 commented 1 year ago

Hi Everyone, I was finetuning a TSR dataset of 487 Tables, The tables are different from the PubTabMed Dataset.

At first I annotated the Dataset using a normal annotation where bounding boxes cover the whole table and the whole columns and rows.

Which was different from the original PubTabMed annotation where the borders touch the text.

In this case the model score was: AP50: 0.794, AP75: 0.458, AP: 0.472, AR: 0.627

I found that this score was very low so what I did was that I changed the annotation to match the PubTabMed dataset and ended up with a score of: AP50: 0.705, AP75: 0.348, AP: 0.371, AR: 0.531

Which is much worse in terms of everything

Why is that happening, how can I fix it, and what do I need to look for to make sure that things are running well?

YingxuanW commented 2 months ago

@bely66 hi, could you please tell me how did you prepare your datas for fine-tuning? I also want to do some fine-tune jobs on structure models but don't know how to prepare my own dataset. Looking forwarding to your reply.

dreamlychina commented 2 weeks ago

@bely66 hi, could you please tell me how did you prepare your datas for fine-tuning? I also want to do some fine-tune jobs on structure models but don't know how to prepare my own dataset. Looking forwarding to your reply.

+1 能加个联系方式？

microsoft / table-transformer

Finetuning Dataset Annotation Format #128