microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20k stars 2.55k forks source link

LayoutReader: Dataset for Fine tunning LayoutReader #576

Open QuangTran2706 opened 2 years ago

QuangTran2706 commented 2 years ago

Describe Model I am using: LayoutReader Firstly, thank you for outstanding contribution in define reading order. Currently, I am trying to fine-tuning the pre-train model. So is there any recommendation on the amount of data set we need to fine tuning the model ?

HYPJUDY commented 2 years ago

Thanks for your question! The larger the dataset, the better the performance. If you could only obtain a small dataset (e.g., hundreds of samples or thousands of samples), the fine-tuning also works in our initial experiments. Data augmentation is also effective under this circumstance. If large data is available, please try using more data until the performance is saturated.