π₯ Model Zoo β’ π οΈ Installation β’ ποΈ Training β’ πΊ Sampling β’ π± Run WebUI
Model | chekcpoint | status |
---|---|---|
FontDiffuer | GoogleDrive / BaiduYun:gexg | Released |
SCR | GoogleDrive / BaiduYun:gexg | Released |
Clone this repo:
git clone https://github.com/yeungchenwa/FontDiffuser.git
Step 0: Download and install Miniconda from the official website.
Step 1: Create a conda environment and activate it.
conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser
Step 2: Install related version Pytorch following here.
# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
Step 3: Install the required packages.
pip install -r requirements.txt
The training data files tree should be (The data examples are shown in directory data_examples/train/
):
βββdata_examples
β βββ train
β βββ ContentImage
β β βββ char0.png
β β βββ char1.png
β β βββ char2.png
β β βββ ...
β βββ TargetImage.png
β βββ style0
β β βββstyle0+char0.png
β β βββstyle0+char1.png
β β βββ ...
β βββ style1
β β βββstyle1+char0.png
β β βββstyle1+char1.png
β β βββ ...
β βββ style2
β β βββstyle2+char0.png
β β βββstyle2+char1.png
β β βββ ...
β βββ ...
Before running the training script (including the following three modes), you should set the training configuration, such as distributed training, through:
accelerate config
Coming Soon ...
sh train_phase_1.sh
data_root
: The data root, as ./data_examples
output_dir
: The training output logs and checkpoints saving directory.resolution
: The resolution of the UNet in our diffusion model.style_image_size
: The resolution of the style image, can be different with resolution
.content_image_size
: The resolution of the content image, should be the same as the resolution
.channel_attn
: Whether to use the channel attention in the MCA block.train_batch_size
: The batch size in the training.max_train_steps
: The maximum of the training steps.learning_rate
: The learning rate when training.ckpt_interval
: The checkpoint saving interval when training.drop_prob
: The classifier-free guidance training probability.After the phase 2 training, you should put the trained checkpoint files (unet.pth
, content_encoder.pth
, and style_encoder.pth
) to the directory phase_1_ckpt
. During phase 2, these parameters will be resumed.
sh train_phase_2.sh
phase_2
: Tag to phase 2 training.phase_1_ckpt_dir
: The model checkpoints saving directory after phase 1 training.scr_ckpt_path
: The ckpt path of pre-trained SCR module. You can download it from above π₯Model Zoo.sc_coefficient
: The coefficient of style contrastive loss for supervision.num_neg
: The number of negative samples, default to be 16
.Option (1) Download the checkpoint following GoogleDrive / BaiduYun:gexg, then put the ckpt
to the root directory, including the files unet.pth
, content_encoder.pth
, and style_encoder.pth
.
Option (2) Put your re-training checkpoint folder ckpt
to the root directory, including the files unet.pth
, content_encoder.pth
, and style_encoder.pth
.
(1) Sampling image from content image and reference image.
sh script/sample_content_image.sh
ckpt_dir
: The model checkpoints saving directory. content_image_path
: The content/source image path.style_image_path
: The style/reference image path.save_image
: set True
if saving as images.save_image_dir
: The image saving directory, the saving files including an out_single.png
and an out_with_cs.png
.device
: The sampling device, recommended GPU acceleration.guidance_scale
: The classifier-free sampling guidance scale.num_inference_steps
: The inference step by DPM-Solver++.(2) Sampling image from content character.
Note Maybe you need a ttf file that contains numerous Chinese characters, you can download it from BaiduYun:wrth.
sh script/sample_content_character.sh
character_input
: If set True
, use character string as content/source input.content_character
: The content/source content character string.gradio gradio_app.py
Example:
Coming Soon ...
@inproceedings{yang2024fontdiffuser,
title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
year={2024}
}