microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.62k stars 2.5k forks source link

[TrOCR] How to train TRocr on custom dataset of different language #886

Open kasuba-badri-vishal opened 1 year ago

kasuba-badri-vishal commented 1 year ago

Hi, I want to change the Decoder part of TRocr to train and infer on different vocabulary [i.e different language]. I was following sample implementation from here but I was not able to change the vocabulary but just the size of vocabulary. It would be really helpful to know how can I change vocabulary for the TRocr decoder part.

Thanks

Mohammed20201991 commented 1 year ago

+1

bit-scientist commented 1 year ago

hope to get some feedback.