[arXiv] [Project Page]
Namhyuk Ahn1, Junsoo Lee1, Chunggi Lee1,2, Kunhee Kim3, Daesik Kim1, Seung-Hun Nam1, Kibeom Hong4†
NAVER WEBTOON AI1, Harvard University2, KAIST3, SwatchOn4
Corresponding author†
AAAI 2024
Recent progresses in large-scale text-to-image models have yielded remarkable accomplishments, finding various applications in art domain. However, expressing unique characteristics of an artwork (e.g. brushwork, colortone, or composition) with text prompts alone may encounter limitations due to the inherent constraints of verbal description. To this end, we introduce DreamStyler, a novel framework designed for artistic image synthesis, proficient in both text-to-image synthesis and style transfer. DreamStyler optimizes a multi-stage textual embedding with a context-aware text prompt, resulting in prominent image quality. In addition, with content and style guidance, DreamStyler exhibits flexibility to accommodate a range of style references. Experimental results demonstrate its superior performance across multiple scenarios, suggesting its promising potential in artistic product creation.
accelerate launch dreamstyler/train.py \
--num_stages 6 \
--train_image_path "./images/03.png" \
--context_prompt "A painting of pencil, pears and apples on a cloth, in the style of {}" \
--placeholder_token "<sks03>" \
--output_dir "./outputs/sks03" \
--learnable_property style \
--initializer_token painting \
--pretrained_model_name_or_path "runwayml/stable-diffusion-v1-5" \
--resolution 512 \
--train_batch_size 8 \
--gradient_accumulation_steps 1 \
--max_train_steps 500 \
--save_steps 100 \
--learning_rate 0.002 \
--lr_scheduler constant \
--lr_warmup_steps 0
We ran training using A100 so if your GPU is not enough memory, please reduce --batch_size
and increase --max_train_steps
appropritately, or increase --gradeint_accumulation_steps
.
The --train_image_path
is the image path to mimic and the style and you should provide --context_prompt
to enhance the personalization performance. You may refer ./images/prompt_blip2.txt
or ./images/prompt_hf.txt
to check examples of context prompt of images in ./images
directory. The prompts in the prompt_blip2.txt
are automatically extracted by BLIP-2 and the prompts in the prompt_hf.txt
are the enhanced version by the human feedback (further annotation).
python dreamstyler/inference_t2i.py \
--sd_path "runwayml/stable-diffusion-v1-5" \
--embedding_path ./outputs/sks03/embedding/final.bin \
--prompt "A painting of a lighthight next to cliff in the style of {}" \
--saveroot ./outputs/sample03 \
--placeholder_token "<sks03>"
python dreamstyler/inference_style_transfer.py \
--sd_path "runwayml/stable-diffusion-v1-5" \
--embedding_path ./outputs/sks03/embedding/final.bin \
--content_image_path ./images/content.png \
--saveroot ./outputs/sample03 \
--placeholder_token "<sks03>"
@article{ahn2023dreamstyler,
title={DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models},
author={Ahn, Namhyuk and Lee, Junsoo and Lee, Chunggi and Kim, Kunhee and Kim, Daesik and Nam, Seung-Hun and Hong, Kibeom},
journal={arXiv preprint arXiv:2309.06933},
year={2023},
}
DreamStyler
Copyright 2024-present NAVER WEBTOON
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.