tyxsspa / AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Apache License 2.0
4.28k stars 282 forks source link

在标准字体上效果不佳 #88

Open KeystoneScience opened 5 months ago

KeystoneScience commented 5 months ago

我尝试使用这个工具替换图像中的文字,这些文字是用类似Photoshop的软件叠加的。似乎在这类任务上表现不佳,有没有办法让我对模型进行微调,以适应这种描述的合成数据?

tyxsspa commented 5 months ago

你好,模型训练时是用Arial Unicode MS这个字体的,如果用其他字体需要finetune一下。可能需要注意下所用字体render的大小即可。

KeystoneScience commented 5 months ago

谢谢您的回复,我非常喜欢您的工作!

我只是想用它来随机翻译用户照片上的文本(比如YouTube视频缩略图),我不确定通常会使用哪些字体。考虑到这一点,您有什么建议吗?另外,您知道在哪里可以找到关于如何微调模型的信息吗?我在这方面还是新手,非常感谢您能提供的任何帮助。谢谢!

tyxsspa commented 5 months ago

Hi, please disregard my previous reply; I misunderstood your question (possibly due to inaccuracies in the Chinese expression via automatic translation). Your issue concerns the text editing results for printed fonts on images, right? This is indeed a known problem. Currently, anytext primarily focuses more on text generation than text editing task. However, you can still attempt to finetune a specialized model, just set the mask_ratio in train.py to 0.8 or higher, and use the English subset of the AnyWord-3M training dataset (better if with more own data). This way, you will obtain a model that is tailored for English and focused on text editing tasks, and its performance will definitely be stronger than the current general-use model.

KeystoneScience commented 5 months ago

Oh I see, thank you so much for your thought out response! I'll give that a try.

You are absolutely right on it being about editing results for printed fonts on images.

I am trying to make it well suited for translating any of the top 40 or so languages between one another, specifically tuned for things like Youtube Thumbnails.