microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.14k stars 2.44k forks source link

finetune textdiffuser 2 in another languague #1392

Open cuongngm opened 6 months ago

cuongngm commented 6 months ago

Describe Model I am using (UniLM, MiniLM, LayoutLM ...): textdiffuser 2

How to use this model on text in other languages, I hope the author will guide me on how to adapt it

JingyeChen commented 6 months ago

Thanks for your interest in TextDiffuser-2. Basically it is simple for adapting. You need to prepare a text image dataset in other language and add corresponding tokens to the tokenizer. You can refer to the code about adding English tokens and simply replacing them in other languages.