microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.48k stars 2.48k forks source link

How to create my own pretrained model on Bangla language using TrOCR #1039

Open rajsabi opened 1 year ago

rajsabi commented 1 year ago

Hi @NielsRogge ! I want to create my own pretrained raw model on Bangla language (like trocr-small-stage1)and further fine tune it with bangla dataset. I have gone through the official implementation of trocr paper https://github.com/microsoft/unilm/tree/master/trocr but can't find. Could you please explain and tell me how can I achieve this? Thanks!

hrithickcodes commented 1 year ago

Hi, to finetune TrOCR you can look into this notebook. It finetunes TrOCR on IAM handwriting dataset, for Bengali you can create your own custom dataset and follow the instructions given in the notebook.