yahoo / photo-background-generation

Apache License 2.0
43 stars 4 forks source link

Salient Object Aware Background Generation Paper Model

This repository accompanies our paper, Salient Object-Aware Background Generation using Text-Guided Diffusion Models, which has been accepted for publication in CVPR 2024 Generative Models for Computer Vision workshop. You can try our model on Huggingface.

The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models. We show that models such as Stable Inpainting can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. Some examples of object expansion:

Setup

The dependencies are provided in requirements.txt, install them by:

pip install -r requirements.txt

Usage

Training

The following runs the training of text-to-image inpainting ControlNet initialized with the weights of "stable-diffusion-2-inpainting":

accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet_inpaint.py --pretrained_model_name_or_path "stable-diffusion-2-inpainting" --proportion_empty_prompts 0.1

The following runs the training of text-to-image ControlNet initialized with the weights of "stable-diffusion-2-base":

accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet.py --pretrained_model_name_or_path "stable-diffusion-2-base" --proportion_empty_prompts 0.1

Inference

Please refer to inference.ipynb. Tu run the code you need to download our model checkpoints. You can also try our model using Huggingface pipeline:

from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("yahoo-inc/photo-background-generation")

Models Checkpoints

Model link Datasets used
controlnet_inpainting_salient_aware.pth Salient segmentation datasets, COCO

Citations

If you found our work useful, please consider citing our paper:

@misc{eshratifar2024salient,
      title={Salient Object-Aware Background Generation using Text-Guided Diffusion Models}, 
      author={Amir Erfan Eshratifar and Joao V. B. Soares and Kapil Thadani and Shaunak Mishra and Mikhail Kuznetsov and Yueh-Ning Ku and Paloma de Juan},
      year={2024},
      eprint={2404.10157},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Maintainers

License

This project is licensed under the terms of the Apache 2.0 open source license. Please refer to LICENSE for the full terms.