./scripts
.bash scripts/train.sh
scripts
.
Rectified Flow is a promising way for accelerating pre-trained diffusion models. However, the generation quality of prior fast flow-based models on Stable Diffusion (such as InstaFlow) is unsatisfactory. In this work, we did several improvements to the original reflow pipeline to significantly boost the performance of flow-based fast SD. Our new model learns a piecewise linear probability flow which can efficiently generate high-quality images in just 4 steps, termed piecewise rectified flow (PeRFlow). Moreover, we found the difference of model weights, ${\Delta}W = W{\text{PeRFlow}} - W{\text{SD}}$, can be used as a plug-and-play accelerator module on a wide-range of SD-based models.
Specifically, PeRFlow has several features:
Fast Generation
: PeRFlow can generate high-fidelity images in just 4 steps. The images generated from PeRFlow are more diverse than other fast-sampling models (such as LCM). Moreover, as PeRFlow is a continuous probability flow, it supports 8-step, 16-step, or even higher number of sampling steps to monotonically increase the generation quality. Efficient Training
: Fine-tuning PeRFlow based on SD 1.5 converges in just 4,000 training iterations (with a batch size of 1024). In comparison, previous fast flow-based text-to-image model, InstaFlow, requires 25,000 training iteration with the same batch size in fine-tuning. Besides, PeRFlow does not require heavy data generation for reflow. Compatible with SD Workflows
: PeRFlow works with various stylized LORAs and generation/editing pipelines of the pretrained SD model.
As a plug-and-play module, $\Delta W$ can be directly combined with other conditional generation pipelines, such as ControlNet, IP-Adaptor, multi-view generation.Classifier-Free Guidance
: PeRFlow is fully compatible with classifier-free guidance and supports negative prompts, which are important for pushing the generation quality to even higher level. Empirically, the CFG scale is similar to the original diffusion model.Generate high-quality images (512x512) with only 4 steps!
### Image enhancement via PeRFlow-Refiner By plugging PeRFlow ${\Delta}W$ into the [ControlNet-Tile](https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile) pipeline, we obtain PeRFlow-Refiner to upsample/refine images. We can use PeRFlow-T2I and PeRFlow-Refiner together to generate astonishing x1024 images with lightweight SD-v1.5 backbones. We use 4-step PeRFlow-T2I to generate x512 images first, then upsample them to x1024 with 4-step PeRFlow-Refiner.
One also can use PeRFlow-Refiner separately to enhance texture and details of low-res blurry images. Here are two examples: on the left, from x64 to x1024, and on the right, from x256 to x1024.
### Efficient multiview generation via PeRFlow-Wonder3D *One*-step image-to-multiview is enabled by plugging PeRFlow $\Delta W$ into pre-trained [Wonder3D](https://github.com/xxlong0/Wonder3D). We can use PeRFlow-T2I and PeRFlow-Wonder3D together to generate multiview normal maps and textures from text prompts instantly. Here shows "a dog with glasses and cap", "a bird", and "a vintage car".
### Accelerate other SD pipelines via PeRFlow Plug PeRFlow ${\Delta}W$ into [controlnets](https://huggingface.co/lllyasviel) of SD-v1.5.
Plug PeRFlow ${\Delta}W$ into [IP-adaptor](https://github.com/tencent-ailab/IP-Adapter).
Editing with PeRFlow+[Prompt-to-Prompt](https://github.com/google/prompt-to-prompt)
*Please refer to the [project page](https://piecewise-rectified-flow.github.io) for more results, including the comparison to LCM.* ## Demo Code Install running dependencies with: ```bash env/install.sh```. Training and evaluation scripts are provided in ```scripts```. PeRFlow acceleration yields the delta_weights ${\Delta}W$ corresponding to the pretrained diffusion models. The complete weights of UNet for inference are computed by $W = W_{\text{SD}} + {\Delta}W$, where $W_{\text{SD}}$ are the weights of base models, such as the vanilla or customized (DreamShaper, RealisticVision, etc.) SD models. We provide the delta_weights for SD-v1.5 at [PeRFlow🤗](https://huggingface.co/hansyan). You can download the delta-weights and fuse them into your own SD pipelines. ```python import torch, torchvision from diffusers import StableDiffusionPipeline, UNet2DConditionModel from src.utils_perflow import merge_delta_weights_into_unet from src.scheduler_perflow import PeRFlowScheduler delta_weights = UNet2DConditionModel.from_pretrained("hansyan/perflow-sd15-delta-weights", torch_dtype=torch.float16, variant="v0-1",).state_dict() pipe = StableDiffusionPipeline.from_pretrained("Lykon/dreamshaper-8", torch_dtype=torch.float16,) pipe = merge_delta_weights_into_unet(pipe, delta_weights) pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="diff_eps", num_time_windows=4) pipe.to("cuda", torch.float16) ``` **For easy try**, we also provide complete accelerated weights (already merged with PeRFlow ${\Delta}W$) of several popular diffusion models , including **SD-v1.5** and **SDXL**. Load the model, change the scheduler, then enjoy the fast generation. ```python from diffusers import StableDiffusionXLPipeline pipe = StableDiffusionXLPipeline.from_pretrained("hansyan/perflow-sdxl-dreamshaper", torch_dtype=torch.float16, use_safetensors=True, variant="v0-fix") from src.scheduler_perflow import PeRFlowScheduler pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="ddim_eps", num_time_windows=4) pipe.to("cuda", torch.float16) prompts_list = [ ["photorealistic, uhd, high resolution, high quality, highly detailed; masterpiece, A closeup face photo of girl, wearing a rain coat, in the street, heavy rain, bokeh,", "distorted, blur, low-quality, haze, out of focus",], ["photorealistic, uhd, high resolution, high quality, highly detailed; masterpiece, A beautiful cat bask in the sun", "distorted, blur, low-quality, haze, out of focus",], ] for i, prompts in enumerate(prompts_list): setup_seed(42) prompt, neg_prompt = prompts[0], prompts[1] samples = pipe( prompt = [prompt] * 2, negative_prompt = [neg_prompt] * 2, height = 1024, width = 1024, num_inference_steps = 6, guidance_scale = 2.0, output_type = 'pt', ).images torchvision.utils.save_image(torchvision.utils.make_grid(samples, nrow = 2), f'tmp_{i}.png') ``` ```python import torch, torchvision from diffusers.pipelines.stable_diffusion import StableDiffusionPipeline from src.scheduler_perflow import PeRFlowScheduler pipe = StableDiffusionPipeline.from_pretrained("hansyan/perflow-sd15-dreamshaper", torch_dtype=torch.float16) pipe.scheduler = PeRFlowScheduler.from_config(pipe.scheduler.config, prediction_type="diff_eps", num_time_windows=4) pipe.to("cuda", torch.float16) prompts_list = ["A man with brown skin, a beard, and dark eyes", "A colorful bird standing on the tree, open beak",] for i, prompt in enumerate(prompts_list): generator = torch.Generator("cuda").manual_seed(1024) prompt = "RAW photo, 8k uhd, dslr, high quality, film grain, highly detailed, masterpiece; " + prompt neg_prompt = "distorted, blur, smooth, low-quality, warm, haze, over-saturated, high-contrast, out of focus, dark" samples = pipe( prompt = [prompt], negative_prompt = [neg_prompt], height = 512, width = 512, num_inference_steps = 8, guidance_scale = 7.5, generator = generator, output_type = 'pt', ).images torchvision.utils.save_image(torchvision.utils.make_grid(samples, nrow=4), f"tmp_{i}.png") ``` Scripts for text-to-image and controlnet (depth/edge/pose/tile) are included in ```scripts```. You can try efficient image enhancement via controlnet-tile models. We also provide fast text-to-multiview gradio interface in ```app/Wonder3D``` based on [Wonder3D](https://github.com/xxlong0/Wonder3D). Install ```diffusers 0.19.3``` and run ```python Wonder3D/sd15_t2mv_gradio.py```. ## Method: Accelerating Diffusion Models with Piecewise Rectified Flows [Rectified Flows](https://github.com/gnobitab/RectifiedFlow) proposes to contruct flow-based generative models via linear interpolation, and the trajectories of the learned flow can be straightened with a special operation called **reflow**. However, the reflow procedure requires generating a synthetic dataset by simulating the entire pre-trained probability flow. This consumes a huge amount of storage and time, as well as induces large numerical errors in samples, making it unfavorable for training large-scale foundation models. To address this limitation, we propose **piecewise rectified flow**. By dividing the pre-trained probability flows into multiple time windows and straightening the intermediate probability flows inside each window with reflow, we yield a piecewise linear probability flow that can be sampled within very few steps. This divide-and-conquer strategy successfully avoids the cumbersome simulation of the whole ODE trajectory, thereby allowing us to perform the piecewise reflow operation online in training.
As shown in the figure, the pre-trained probability flow (which can be transformed from a pre-trained diffusion model) maps random noise distribution $\pi_1$, to the data distribution $\pi_0$. It requires many steps to sample from the curved flow with ODE solvers. Instead, PeRFlow divides the sampling trajectories into multiple segments (two as an example here), and straightens each segment with the reflow operation. A well-trained PeRFlow can generate high-quality images in very few steps because of its piecewise linear nature.
**Quantitative Results:** We train a PeRFlow model on LAION-aesthetic-v2 data to accelerate SD-v1.5. We compare the FID with respect to three datasets, including: (1) a subset of 30K images from LAION, (2) a set of 30K images generated from SD-v1.5 with the [JourneyDB](https://huggingface.co/datasets/JourneyDB/JourneyDB) prompts, (3) the validation set of MS-COCO2014. For all these datasets, we generate 30K images with different models using the corresponding text prompts. The results are presented in the following table. PeRFlow has lower FIDs in all the three comparisons according to the numerical results.
LAION5B-30k | SD-v1.5 | COCO2014-30k | ||||
---|---|---|---|---|---|---|
FID | 4-step | 8-step | 4-step | 8-step | 4-step | 8-step |
PeRFlow | 9.74 | 8.62 | 9.46 | 5.05 | 11.31 | 14.16 |
LCM | 15.38 | 19.21 | 15.63 | 21.19 | 23.49 | 29.63 |