yeungchenwa / FontDiffuser

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
https://yeungchenwa.github.io/fontdiffuser-homepage/
298 stars 25 forks source link
deep-learning diffusers diffusion font-generation image-generation
# FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

FontDiffuser_LOGO

[![arXiv preprint](http://img.shields.io/badge/arXiv-2312.12142-b31b1b)](https://arxiv.org/abs/2312.12142) [![Gradio demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-FontDiffuser-ff7c00)](https://huggingface.co/spaces/yeungchenwa/FontDiffuser-Gradio) [![Homepage](https://img.shields.io/badge/Homepage-FontDiffuser-green)](https://yeungchenwa.github.io/fontdiffuser-homepage/) [![Code](https://img.shields.io/badge/Code-FontDiffuser-yellow)](https://github.com/yeungchenwa/FontDiffuser)

πŸ”₯ Model Zoo β€’ πŸ› οΈ Installation β€’ πŸ‹οΈ Training β€’ πŸ“Ί Sampling β€’ πŸ“± Run WebUI

🌟 Highlights

Vis_1 Vis_2

πŸ“… News

πŸ”₯ Model Zoo

Model chekcpoint status
FontDiffuer GoogleDrive / BaiduYun:gexg Released
SCR GoogleDrive / BaiduYun:gexg Released

🚧 TODO List

πŸ› οΈ Installation

Prerequisites (Recommended)

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/FontDiffuser.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Step 3: Install the required packages.

pip install -r requirements.txt

πŸ‹οΈ Training

Data Construction

The training data files tree should be (The data examples are shown in directory data_examples/train/):

β”œβ”€β”€data_examples
β”‚   └── train
β”‚       β”œβ”€β”€ ContentImage
β”‚       β”‚   β”œβ”€β”€ char0.png
β”‚       β”‚   β”œβ”€β”€ char1.png
β”‚       β”‚   β”œβ”€β”€ char2.png
β”‚       β”‚   └── ...
β”‚       └── TargetImage.png
β”‚           β”œβ”€β”€ style0
β”‚           β”‚     β”œβ”€β”€style0+char0.png
β”‚           β”‚     β”œβ”€β”€style0+char1.png
β”‚           β”‚     └── ...
β”‚           β”œβ”€β”€ style1
β”‚           β”‚     β”œβ”€β”€style1+char0.png
β”‚           β”‚     β”œβ”€β”€style1+char1.png
β”‚           β”‚     └── ...
β”‚           β”œβ”€β”€ style2
β”‚           β”‚     β”œβ”€β”€style2+char0.png
β”‚           β”‚     β”œβ”€β”€style2+char1.png
β”‚           β”‚     └── ...
β”‚           └── ...

Training Configuration

Before running the training script (including the following three modes), you should set the training configuration, such as distributed training, through:

accelerate config

Training - Pretraining of SCR

Coming Soon ...

Training - Phase 1

sh train_phase_1.sh

Training - Phase 2

After the phase 2 training, you should put the trained checkpoint files (unet.pth, content_encoder.pth, and style_encoder.pth) to the directory phase_1_ckpt. During phase 2, these parameters will be resumed.

sh train_phase_2.sh

πŸ“Ί Sampling

Step 1 => Prepare the checkpoint

Option (1) Download the checkpoint following GoogleDrive / BaiduYun:gexg, then put the ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.
Option (2) Put your re-training checkpoint folder ckpt to the root directory, including the files unet.pth, content_encoder.pth, and style_encoder.pth.

Step 2 => Run the script

(1) Sampling image from content image and reference image.

sh script/sample_content_image.sh

(2) Sampling image from content character.
Note Maybe you need a ttf file that contains numerous Chinese characters, you can download it from BaiduYun:wrth.

sh script/sample_content_character.sh

πŸ“± Run WebUI

(1) Sampling by FontDiffuser

gradio gradio_app.py

Example:

(2) Sampling by FontDiffuser and Rendering by InstructPix2Pix

Coming Soon ...

πŸŒ„ Gallery

Characters of hard level of complexity

vis_hard

Characters of medium level of complexity

vis_medium

Characters of easy level of complexity

vis_easy

Cross-Lingual Generation (Chinese to Korean)

vis_korean

πŸ’™ Acknowledgement

Copyright

Citation

@inproceedings{yang2024fontdiffuser,
  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
  author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  year={2024}
}

⭐ Star Rising

Star Rising