Training-free Regional Prompting for Diffusion Transformers(Regional-Prompting-FLUX) enables Diffusion Transformers (i.e., FLUX) with find-grained compositional text-to-image generation capability in a training-free manner. Empirically, we show that our method is highly effective and compatible with LoRA, ControlNet and multi-person PULID.
We inference at speed much faster than the RPG-based implementation, yet take up less GPU memory.
Note: Follow the installation guide in PULID to install the necessary dependencies, also follow the instruction in Installation before running the following demos.
Regional Masks | Configuration | Generated Result |
---|---|---|
Red: Person region (xyxy: [64, 320, 448, 1280]) |
Base Prompt: "In a classroom during the afternoon, a man is practicing guitar by himself, with sunlight beautifully illuminating the room" Background Prompt: "empty classroom" Regional Prompts:
|
|
Red: Person 1 region (xyxy: [64, 128, 320, 1280]) Green: Person 2 region (xyxy: [960, 128, 1216, 1280]) |
Base Prompt: "In an elegant dining room, two men are having dinner at opposite ends of a long formal table, with warm lighting creating an atmospheric ambiance" Background Prompt: "a dining room" Regional Prompts:
|
Regional Masks | Configuration | Generated Result |
---|---|---|
Red: Cocktail region (xyxy: [450, 560, 960, 900]) Green: Table region (xyxy: [320, 900, 1280, 1280]) Blue: Background |
Base Prompt: "A tropical cocktail on a wooden table at a beach during sunset." Background Prompt: "A photo" Regional Prompts:
|
|
Red: Rainbow region (xyxy: [0, 0, 1280, 256]) Green: Ship region (xyxy: [0, 256, 1280, 520]) Yellow: Fish region (xyxy: [0, 520, 640, 768]) Blue: Treasure region (xyxy: [640, 520, 1280, 768]) |
Base Prompt: "A majestic ship sails under a rainbow as vibrant marine creatures glide through crystal waters below, embodying nature's wonder, while an ancient, rusty treasure chest lies hidden on the ocean floor." Regional Prompts:
|
|
Red: Woman with torch region (xyxy: [128, 128, 640, 768]) Green: Background |
Base Prompt: "An ancient woman stands solemnly holding a blazing torch, while a fierce battle rages in the background, capturing both strength and tragedy in a historical war scene." Background Prompt: "A photo." Regional Prompts:
|
|
Red: Dog region (assets/demo_custom_0_mask_0.png) Green: Cat region (assets/demo_custom_0_mask_1.png) Blue: Background |
Base Prompt: "dog and cat sitting on lush green grass, in a sunny outdoor setting." Background Prompt: "A photo" Regional Prompts:
|
Note: Generation with segmentation mask is a experimental function, the generated image is not perfectly constrained by the regions, we assume it is because the mask suffers from degradation during the downsampling process. |
Regional Masks | Configuration | Generated Result |
---|---|---|
Red: Dinosaur region (xyxy: [0, 0, 640, 1280]) Blue: City region (xyxy: [640, 0, 1280, 1280]) |
Base Prompt: "Sketched style: A cute dinosaur playfully blowing tiny fire puffs over a cartoon city in a cheerful scene." Regional Prompts:
|
|
Red: UFO region (xyxy: [320, 320, 640, 640]) |
Base Prompt: "A cute cartoon-style UFO floating above a sunny city street, artistic style blends reality and illustration elements" Background Prompt: "A photo" Regional Prompts:
|
Regional Masks | Configuration | Generated Result |
---|---|---|
Red: First car region (xyxy: [0, 0, 426, 968]) Green: Second car region (xyxy: [426, 0, 853, 968]) Blue: Third car region (xyxy: [853, 0, 1280, 968]) |
Base Prompt: "Three high-performance sports cars, red, blue, and yellow, are racing side by side on a city street" Regional Prompts:
|
|
Red: Woman region (xyxy: [0, 0, 640, 968]) Green: Beach region (xyxy: [640, 0, 1280, 968]) |
Base Prompt: "A woman walking along a beautiful beach with a scenic coastal view." Regional Prompts:
|
We use previous commit from diffusers repo to ensure reproducibility, as we found new diffusers version may experience different results.
# install diffusers locally
git clone https://github.com/huggingface/diffusers.git
cd diffusers
# reset diffusers version to 0.31.dev, where we developed Regional-Prompting-FLUX on, different version may experience different results
git reset --hard d13b0d63c0208f2c4c078c4261caf8bf587beb3b
pip install -e ".[torch]"
cd ..
# install other dependencies
pip install -U transformers sentencepiece protobuf PEFT
# clone this repo
git clone https://github.com/antonioo-c/Regional-Prompting-FLUX.git
# replace file in diffusers
cd Regional-Prompting-FLUX
cp transformer_flux.py ../diffusers/src/diffusers/models/transformers/transformer_flux.py
# additionally, if you want use PULID with Regional-Prompting-FLUX, follow the installation guide in [PULID](https://github.com/ToTheBeginning/PuLID) to install the necessary dependencies. Then,
# cd .. && git clone https://github.com/ToTheBeginning/PuLID.git
# cd Regional-Prompting-FLUX
# cp transformer_flux_pulid.py ../diffusers/src/diffusers/models/transformers/transformer_flux.py
See detailed example (including LoRAs, ControlNets, and PULID) in infer_flux_regional.py and infer_flux_regional_pulid.py. Below is a quick start example.
import torch
from pipeline_flux_regional import RegionalFluxPipeline, RegionalFluxAttnProcessor2_0
pipeline = RegionalFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
attn_procs = {}
for name in pipeline.transformer.attn_processors.keys():
if 'transformer_blocks' in name and name.endswith("attn.processor"):
attn_procs[name] = RegionalFluxAttnProcessor2_0()
else:
attn_procs[name] = pipeline.transformer.attn_processors[name]
pipeline.transformer.set_attn_processor(attn_procs)
## general settings
image_width = 1280
image_height = 768
num_inference_steps = 24
seed = 124
base_prompt = "An ancient woman stands solemnly holding a blazing torch, while a fierce battle rages in the background, capturing both strength and tragedy in a historical war scene."
background_prompt = "a photo" # set by default, but if you want to enrich background, you can set it to a more descriptive prompt
regional_prompt_mask_pairs = {
"0": {
"description": "A dignified woman in ancient robes stands in the foreground, her face illuminated by the torch she holds high. Her expression is one of determination and sorrow, her clothing and appearance reflecting the historical period. The torch casts dramatic shadows across her features, its flames dancing vibrantly against the darkness.",
"mask": [128, 128, 640, 768]
}
}
## region control factor settings
mask_inject_steps = 10 # larger means stronger control, recommended between 5-10
double_inject_blocks_interval = 1 # 1 means strongest control
single_inject_blocks_interval = 1 # 1 means strongest control
base_ratio = 0.2 # smaller means stronger control
regional_prompts = []
regional_masks = []
background_mask = torch.ones((image_height, image_width))
for region_idx, region in regional_prompt_mask_pairs.items():
description = region['description']
mask = region['mask']
x1, y1, x2, y2 = mask
mask = torch.zeros((image_height, image_width))
mask[y1:y2, x1:x2] = 1.0
background_mask -= mask
regional_prompts.append(description)
regional_masks.append(mask)
# if regional masks don't cover the whole image, append background prompt and mask
if background_mask.sum() > 0:
regional_prompts.append(background_prompt)
regional_masks.append(background_mask)
image = pipeline(
prompt=base_prompt,
width=image_width, height=image_height,
mask_inject_steps=mask_inject_steps,
num_inference_steps=num_inference_steps,
generator=torch.Generator("cuda").manual_seed(seed),
joint_attention_kwargs={
"regional_prompts": regional_prompts,
"regional_masks": regional_masks,
"double_inject_blocks_interval": double_inject_blocks_interval,
"single_inject_blocks_interval": single_inject_blocks_interval,
"base_ratio": base_ratio
},
).images[0]
image.save(f"output.jpg")
Our work is sponsored by HuggingFace and fal.ai. Thanks!
If you find Regional-Prompting-FLUX useful for your research and applications, please cite us using this BibTeX:
@article{chen2024training,
title={Training-free Regional Prompting for Diffusion Transformers},
author={Chen, Anthony and Xu, Jianjin and Zheng, Wenzhao and Dai, Gaole and Wang, Yida and Zhang, Renrui and Wang, Haofan and Zhang, Shanghang},
journal={arXiv preprint arXiv:2411.02395},
year={2024}
}
For any question, feel free to contact us via antonchen@pku.edu.cn.