This repository accompanies our paper, Salient Object-Aware Background Generation using Text-Guided Diffusion Models, which has been accepted for publication in CVPR 2024 Generative Models for Computer Vision workshop. You can try our model on Huggingface.
The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models. We show that models such as Stable Inpainting can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. Some examples of object expansion:
The dependencies are provided in requirements.txt
, install them by:
pip install -r requirements.txt
The following runs the training of text-to-image inpainting ControlNet initialized with the weights of "stable-diffusion-2-inpainting":
accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet_inpaint.py --pretrained_model_name_or_path "stable-diffusion-2-inpainting" --proportion_empty_prompts 0.1
The following runs the training of text-to-image ControlNet initialized with the weights of "stable-diffusion-2-base":
accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet.py --pretrained_model_name_or_path "stable-diffusion-2-base" --proportion_empty_prompts 0.1
Please refer to inference.ipynb
. Tu run the code you need to download our model checkpoints. You can also try our model using Huggingface pipeline:
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("yahoo-inc/photo-background-generation")
Model link | Datasets used |
---|---|
controlnet_inpainting_salient_aware.pth | Salient segmentation datasets, COCO |
If you found our work useful, please consider citing our paper:
@misc{eshratifar2024salient,
title={Salient Object-Aware Background Generation using Text-Guided Diffusion Models},
author={Amir Erfan Eshratifar and Joao V. B. Soares and Kapil Thadani and Shaunak Mishra and Mikhail Kuznetsov and Yueh-Ning Ku and Paloma de Juan},
year={2024},
eprint={2404.10157},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This project is licensed under the terms of the Apache 2.0 open source license. Please refer to LICENSE for the full terms.