Fine-tune TextDiffuser to generate handwritten text

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

https://aka.ms/GeneralAI

MIT License

19.64k stars 2.51k forks source link

Fine-tune TextDiffuser to generate handwritten text #1147

Open staghado opened 1 year ago

staghado commented 1 year ago

I wonder if it's reasonable to try and fine-tune TextDiffuser on handwriting generation instead of digital text. Has anyone tried this so far?

JingyeChen commented 1 year ago

TextDiffuser can be adapted to generate handwritten texts in two ways:

(1) fine-tune it with text images containing handwritten texts; (2) use the current version and add "handwritten" in the prompt to control the style

Hope it will help!

staghado commented 1 year ago

Thank you for your quick reply!

I have tried the second way by providing different prompts(text-to-image and text-to-image-with-template) but the results were not what I was looking for(the handwriting didn't look realistic, more like font type of writing).

So I will try the first way in the coming days, for that :

do you have any suggestions for custom dataset design : prompts, whole image or word images ... etc
how much data is needed (approx.) to do the fine-tuning in this particular case?

JingyeChen commented 1 year ago

For the second way, I suppose you can replace the default font "Arial.ttf" to a handwritten font and give it a try. The intermediate shape of mask/layout will affect the final prediction.

For the first way, maybe you could filter some handwritten cases from the MARIO dataset. I am also not sure how much data is adequate for adapting.