yeungchenwa / FontDiffuser

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
230 stars 21 forks source link
deep-learning diffusers diffusion font-generation image-generation
# FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning


[![arXiv preprint](]( [![Gradio demo](]( [![Homepage](]( [![Code](](

πŸ”₯ Model Zoo β€’ πŸ› οΈ Installation β€’ πŸ‹οΈ Training β€’ πŸ“Ί Sampling β€’ πŸ“± Run WebUI

🌟 Highlights

Vis_1 Vis_2

πŸ“… News

πŸ”₯ Model Zoo

Model chekcpoint status
FontDiffuer GoogleDrive / BaiduYun:gexg Released
SCR GoogleDrive / BaiduYun:gexg Released

🚧 TODO List

πŸ› οΈ Installation

Prerequisites (Recommended)

Environment Setup

Clone this repo:

git clone

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url

Step 3: Install the required packages.

pip install -r requirements.txt

πŸ‹οΈ Training

Data Construction

The training data files tree should be (The data examples are shown in directory data_examples/train/):

β”‚   └── train
β”‚       β”œβ”€β”€ ContentImage
β”‚       β”‚   β”œβ”€β”€ char0.png
β”‚       β”‚   β”œβ”€β”€ char1.png
β”‚       β”‚   β”œβ”€β”€ char2.png
β”‚       β”‚   └── ...
β”‚       └── TargetImage.png
β”‚           β”œβ”€β”€ style0
β”‚           β”‚     β”œβ”€β”€style0+char0.png
β”‚           β”‚     β”œβ”€β”€style0+char1.png
β”‚           β”‚     └── ...
β”‚           β”œβ”€β”€ style1
β”‚           β”‚     β”œβ”€β”€style1+char0.png
β”‚           β”‚     β”œβ”€β”€style1+char1.png
β”‚           β”‚     └── ...
β”‚           β”œβ”€β”€ style2
β”‚           β”‚     β”œβ”€β”€style2+char0.png
β”‚           β”‚     β”œβ”€β”€style2+char1.png
β”‚           β”‚     └── ...
β”‚           └── ...

Training Configuration

Before running the training script (including the following three modes), you should set the training configuration, such as distributed training, through:

accelerate config

Training - Pretraining of SCR

Coming Soon ...

Training - Phase 1


Training - Phase 2

After the phase 2 training, you should put the trained checkpoint files (unet.pth, content_encoder.pth, and style_encoder.pth) to the directory phase_1_ckpt. During phase 2, these parameters will be resumed.


πŸ“Ί Sampling

Step 1 => Prepare the checkpoint

Option (1) Download the checkpoint following GoogleDrive / BaiduYun:gexg, then put the ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.
Option (2) Put your re-training checkpoint folder ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.

Step 2 => Run the script

(1) Sampling image from content image and reference image.

sh script/

(2) Sampling image from content character.
Note Maybe you need a ttf file that contains numerous Chinese characters, you can download it from BaiduYun:wrth.

sh script/

πŸ“± Run WebUI

(1) Sampling by FontDiffuser



(2) Sampling by FontDiffuser and Rendering by InstructPix2Pix

Coming Soon ...

πŸŒ„ Gallery

Characters of hard level of complexity


Characters of medium level of complexity


Characters of easy level of complexity


Cross-Lingual Generation (Chinese to Korean)


πŸ’™ Acknowledgement



  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
  author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},

⭐ Star Rising

Star Rising