PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models \ [ICLR 2024 Spotlight] [Diffusers 1] [Diffusers 2] [Project] [Code]
SDXL-Turbo: Adversarial Diffusion Distillation \ [Website] [Diffusers 1] [Diffusers 2] [Project] [Code]
Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping \ [Website] [Diffusers 1] [Diffusers 2] [Project] [Code]
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module \ [Website] [Diffusers] [Project] [Code]
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference \ [Website] [Project] [Code]
DMD2: Improved Distribution Matching Distillation for Fast Image Synthesis \ [NeurIPS 2024 Oral] [Project] [Code]
DMD1: One-step Diffusion with Distribution Matching Distillation \ [CVPR 2024] [Project] [Code]
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation \ [CVPR 2024] [Project] [Code]
SwiftBrush V2: Make Your One-Step Diffusion Model Better Than Its Teacher \ [ECCV 2024] [Project] [Code]
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation \ [CVPR 2024] [Project] [Code]
PCM : Phased Consistency Model \ [NeurIPS 2024] [Project] [Code]
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation \ [NeurIPS 2024] [Project] [Code]
KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis \ [NeurIPS 2024] [Project] [Code]
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation \ [Website] [Project] [Code]
Adaptive Caching for Faster Video Generation with Diffusion Transformers \ [Website] [Project] [Code]
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality \ [Website] [Project] [Code]
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions \ [Website] [Project] [Code]
Reward Guided Latent Consistency Distillation \ [Website] [Project] [Code]
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching \ [Website] [Project] [Code]
Relational Diffusion Distillation for Efficient Image Generation \ [ACM MM 2024 (Oral)] [Code]
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs \ [CVPR 2024] [Code]
SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow \ [ECCV 2024] [Code]
Accelerating Image Generation with Sub-path Linear Approximation Model \ [ECCV 2024] [Code]
Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models \ [NeurIPS 2023] [Code]
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference \ [NeurIPS 2024] [Code]
A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models \ [ICML 2024] [Code]
Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation \ [ICML 2024] [Code]
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation \ [ICLR 2024] [Code]
Accelerating Vision Diffusion Transformers with Skip Branches \ [Website] [Code]
One Step Diffusion via Shortcut Models \ [Website] [Code]
DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach \ [Website] [Code]
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training \ [Website] [Code]
Stable Consistency Tuning: Understanding and Improving Consistency Models \ [Website] [Code]
SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models \ [Website] [Code]
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching \ [Website] [Code]
Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation \ [Website] [Code]
Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation \ [Website] [Code]
Diffusion Models Are Innate One-Step Generators \ [Website] [Code]
Distilling Diffusion Models into Conditional GANs \ [ECCV 2024] [Project]
Cache Me if You Can: Accelerating Diffusion Models through Block Caching \ [CVPR 2024] [Project]
Plug-and-Play Diffusion Distillation \ [CVPR 2024] [Project]
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds \ [NeurIPS 2023] [Project]
Truncated Consistency Models \ [Website] [Project]
Multi-student Diffusion Distillation for Better One-step Generators \ [Website] [Project]
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification \ [NeurIPS 2024]
One-Step Diffusion Distillation through Score Implicit Matching \ [NeurIPS 2024]
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration \ [Website]
Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models \ [Website]
MLCM: Multistep Consistency Distillation of Latent Diffusion Model \ [Website]
EM Distillation for One-step Diffusion Models \ [Website]
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding \ [Website]
Importance-based Token Merging for Diffusion Models \ [Website]
Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation \ [Website]
Accelerating Diffusion Models with One-to-Many Knowledge Distillation \ [Website]
TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution \ [Website]
DDIL: Improved Diffusion Distillation With Imitation Learning \ [Website]
OSV: One Step is Enough for High-Quality Image to Video Generation \ [Website]
Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance \ [Website]
Token Caching for Diffusion Transformer Acceleration \ [Website]
DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization \ [Website]
Flow Generator Matching \ [Website]
Multistep Distillation of Diffusion Models via Moment Matching \ [Website]
SFDDM: Single-fold Distillation for Diffusion models \ [Website]
LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models \ [Website]
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion \ [Website]
SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation \ [Website]
SDXL-Lightning: Progressive Adversarial Diffusion Distillation \ [Website]
Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training \ [Website]
TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution \ [Website]
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising \ [NeurIPS 2024] [Project] [Code]
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy \ [NeurIPS 2024] [Project] [Code]
DeepCache: Accelerating Diffusion Models for Free \ [CVPR 2024] [Project] [Code]
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference \ [NeurIPS 2024] [Code]
DiTFastAttn: Attention Compression for Diffusion Transformer Models \ [NeurIPS 2024] [Code]
Structural Pruning for Diffusion Models \ [NeurIPS 2023] [Code]
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration \ [ICCV 2023] [Code]
Agent Attention: On the Integration of Softmax and Linear Attention \ [ECCV 2024] [Code]
Token Merging for Fast Stable Diffusion \ [CVPRW 2024] [Code]
FORA: Fast-Forward Caching in Diffusion Transformer Acceleration \ [Website] [Code]
Real-Time Video Generation with Pyramid Attention Broadcast \ [Website] [Code]
Accelerating Diffusion Transformers with Token-wise Feature Caching \ [Website] [Code]
TGATE-V1: Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models \ [Website] [Code]
TGATE-V2: Faster Diffusion via Temporal Attention Decomposition \ [Website] [Code]
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers \ [Website] [Code]
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models \ [CVPR 2024] [Project]
Cache Me if You Can: Accelerating Diffusion Models through Block Caching \ [Website] [Project]
Token Fusion: Bridging the Gap between Token Pruning and Token Merging \ [WACV 2024]
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding \ [Website]
PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future \ [Website]
Δ-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers \ [Website]
Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step \ [Website]
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences \ [Website]
Fast constrained sampling in pre-trained diffusion models \ [Website]
Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model \ [ICLR 2023 oral] [Project] [Code]
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild \ [CVPR 2024] [Project] [Code]
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model \ [CVPR 2024] [Project] [Code]
Zero-Reference Low-Light Enhancement via Physical Quadruple Priors \ [CVPR 2024] [Project] [Code]
From Posterior Sampling to Meaningful Diversity in Image Restoration \ [ICLR 2024] [Project] [Code]
Generative Diffusion Prior for Unified Image Restoration and Enhancement \ [CVPR 2023] [Project] [Code]
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration \ [ECCV 2024] [Project] [Code]
Image Restoration with Mean-Reverting Stochastic Differential Equations \ [ICML 2023] [Project] [Code]
PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging \ [NeurIPS 2024 Spotlight] [Project] [Code]
Denoising Diffusion Models for Plug-and-Play Image Restoration \ [CVPR 2023 Workshop NTIRE] [Project] [Code]
Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing \ [Website] [Project] [Code]
Solving Video Inverse Problems Using Image Diffusion Models \ [Website] [Project] [Code]
Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration \ [Website] [Project] [Code]
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion \ [Website] [Project] [Code]
FlowIE: Efficient Image Enhancement via Rectified Flow \ [CVPR 2024 oral] [Code]
ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting \ [NeurIPS 2023 (Spotlight)] [Code]
GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration \ [ICML 2023 oral] [Code]
Diffusion Priors for Variational Likelihood Estimation and Image Denoising \ [NeurIPS 2024 Spotlight] [Code]
Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance \ [CVPR 2024] [Code]
DiffIR: Efficient Diffusion Model for Image Restoration \ [ICCV 2023] [Code]
LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models \ [ECCV 2024] [Code]
Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model \ [ECCV 2024] [Code]
DAVI: Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problem \ [ECCV 2024] [Code]
Low-Light Image Enhancement with Wavelet-based Diffusion Models \ [SIGGRAPH Asia 2023] [Code]
Residual Denoising Diffusion Models \ [CVPR 2024] [Code]
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks \ [CVPR 2024] [Code]
Deep Equilibrium Diffusion Restoration with Parallel Sampling \ [CVPR 2024] [Code]
ReFIR: Grounding Large Restoration Models with Retrieval Augmentation \ [NeurIPS 2024] [Code]
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation \ [NeurIPS 2024] [Code]
Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems \ [Website] [Code]
UniProcessor: A Text-induced Unified Low-level Image Processor \ [Website] [Code]
Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models \ [CVPR 2023 Workshop NTIRE] [Code]
Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement \ [CVPR 2024 Workshop NTIRE] [Code]
PnP-Flow: Plug-and-Play Image Restoration with Flow Matching \ [Website] [Code]
Deep Data Consistency: a Fast and Robust Diffusion Model-based Solver for Inverse Problems \ [Website] [Code]
Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration \ [Website] [Code]
Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling \ [Website] [Code]
Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models \ [Website] [Code]
Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior \ [Website] [Code]
Frequency Compensated Diffusion Model for Real-scene Dehazing \ [Website] [Code]
Efficient Image Deblurring Networks based on Diffusion Models \ [Website] [Code]
Blind Image Restoration via Fast Diffusion Inversion \ [Website] [Code]
DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models \ [Website] [Code]
Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling \ [Website] [Code]
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration \ [Website] [Code]
Unlimited-Size Diffusion Restoration \ [Website] [Code]
VmambaIR: Visual State Space Model for Image Restoration \ [Website] [Code]
Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model \ [Website] [Code]
Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model \ [Website] [Code]
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions \ [ECCV 2024] [Project]
Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models \ [NeurIPS 2024] [Project]
GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration \ [Website] [Project]
Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model \ [ICCV 2023]
Multiscale Structure Guided Diffusion for Image Deblurring \ [ICCV 2023]
Boosting Image Restoration via Priors from Pre-trained Models \ [CVPR 2024]
A Modular Conditional Diffusion Framework for Image Reconstruction \ [Website]
Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model \ [Website]
Particle-Filtering-based Latent Diffusion for Inverse Problems \ [Website]
Bayesian Conditioned Diffusion Models for Inverse Problem \ [Website]
ReCo-Diff: Explore Retinex-Based Condition Strategy in Diffusion Model for Low-Light Image Enhancement \ [Website]
Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration \ [Website]
Tell Me What You See: Text-Guided Real-World Image Denoising\ [Website]
Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement \ [Website]
Prototype Clustered Diffusion Models for Versatile Inverse Problems \ [Website]
AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement \ [Website]
Taming Generative Diffusion for Universal Blind Image Restoration \ [Website]
Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL \ [Website]
Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior \ [Website]
Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration \ [Website]
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process \ [Website]
Diffusion State-Guided Projected Gradient for Inverse Problems \ [Website]
InstantIR: Blind Image Restoration with Instant Generative Reference \ [Website]
Score-Based Variational Inference for Inverse Problems \ [Website]
Towards Flexible and Efficient Diffusion Low Light Enhancer \ [Website]
G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving \ [Website]
AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations \ [Website]
Control Color: Multimodal Diffusion-based Interactive Image Colorization \ [Website] [Project] [Code]
Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior \ [Website] [Project] [Code]
ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text \ [Website] [Code]
Diffusing Colors: Image Colorization with Text Guided Diffusion \ [SIGGRAPH Asia 2023] [Project]
Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements \ [Website]
DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models \ [Website]
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior \ [Website] [Project] [Code]
OSDFace: One-Step Diffusion Model for Face Restoration \ [Website] [Project] [Code]
DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration \ [CVPR 2023] [Code]
PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance \ [NeurIPS 2023] [Code]
DifFace: Blind Face Restoration with Diffused Error Contraction \ [Website] [Code]
AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior \ [Website] [Code]
RestorerID: Towards Tuning-Free Face Restoration with ID Preservation \ [Website] [Code]
Towards Real-World Blind Face Restoration with Generative Diffusion Prior \ [Website] [Code]
Towards Unsupervised Blind Face Restoration using Diffusion Prior \ [Website] [Project]
DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration \ [Website]
CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models \ [Website]
DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration \ [Website]
Gaussian is All You Need: A Unified Framework for Solving Inverse Problems via Diffusion Posterior Sampling \ [Website]
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model \ [Website]
DR-BFR: Degradation Representation with Diffusion Models for Blind Face Restoration \ [Website]
⭐⭐Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models \ [CVPR 2024] [Project] [Code]
⭐⭐Training-Free Consistent Text-to-Image Generation \ [SIGGRAPH 2024] [Project] [Code]
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models \ [SIGGRAPH 2024] [Project] [Code]
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation \ [Website] [Project] [Code]
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation \ [Website] [Project] [Code]
StoryGPT-V: Large Language Models as Consistent Story Visualizers \ [Website] [Project] [Code]
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation \ [Website] [Project] [Code]
TaleCrafter: Interactive Story Visualization with Multiple Characters \ [Website] [Project] [Code]
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization \ [Website] [Project] [Code]
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation \ [Website] [Project] [Code]
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion \ [ECCV 2024] [Code]
Make-A-Story: Visual Memory Conditioned Consistent Story Generation \ [CVPR 2023] [Code]
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation \ [Website] [Code]
SEED-Story: Multimodal Long Story Generation with Large Language Model \ [Website] [Code]
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models \ [Website] [Code]
Masked Generative Story Transformer with Character Guidance and Caption Augmentation \ [Website] [Code]
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization \ [Website] [Code]
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models \ [Website] [Code]
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion \ [Website] [Project]
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising \ [Website] [Project]
Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis \ [ICASSP 2024]
CogCartoon: Towards Practical Story Visualization \ [Website]
Generating coherent comic with rich story using ChatGPT and Stable Diffusion \ [Website]
Improved Visual Story Generation with Adaptive Context Modeling \ [Website]
Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control \ [Website]
Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models \ [Website]
Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models \ [Website]
ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models \ [Website]
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection \ [Website]
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration \ [Website]
TryOnDiffusion: A Tale of Two UNets \ [CVPR 2023] [Website] [Project] [Official Code] [Unofficial Code]
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On \ [CVPR 2024] [Project] [Code]
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding \ [Website] [Project] [Code]
IMAGDressing-v1: Customizable Virtual Dressing \ [Website] [Project] [Code]
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person \ [Website] [Project] [Code]
ViViD: Video Virtual Try-on using Diffusion Models \ [Website] [Project] [Code]
GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting \ [Website] [Project] [Code]
Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images \ [Website] [Project] [Code]
From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation \ [Website] [Project] [Code]
PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns \ [Website] [Project] [Code]
StableGarment: Garment-Centric Generation via Stable Diffusion \ [Website] [Project] [Code]
Improving Diffusion Models for Virtual Try-on \ [Website] [Project] [Code]
D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On \ [ECCV 2024] [Code]
Improving Virtual Try-On with Garment-focused Diffusion Models \ [ECCV 2024] [Code]
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On \ [CVPR 2024] [Code]
Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow \ [ACM MM 2023] [Code]
LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On \ [ACM MM 2023] [Code]
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on \ [Website] [Code]
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Model \ [Website] [Code]
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling \ [Website] [Code]
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model \ [Website] [Code]
MV-VTON: Multi-View Virtual Try-On with Diffusion Models \ [Website] [Code]
M&M VTO: Multi-Garment Virtual Try-On and Editing \ [CVPR 2024 Highlight] [Project]
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models \ [ECCV 2024] [Project]
Fashion-VDM: Video Diffusion Model for Virtual Try-On \ [SIGGRAPH Asia 2024] [Project]
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos \ [Website] [Project]
Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild \ [Website] [Project]
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models \ [Website] [Project]
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All \ [Website] [Project]
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment \ [Website] [Project]
VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers \ [Website] [Project]
AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario \ [Website] [Project]
FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on \ [IJCAI 2024]
GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon \ [Website]
WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on \ [Website]
Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles \ [Website]
Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models \ [Website]
Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models \ [Website]
ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On \ [Website]
ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model \ [Website]
AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion \ [Website]
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing \ [Website]
TED-VITON: Transformer-Empowered Diffusion Models for Virtual Try-On \ [Website]
Controllable Human Image Generation with Personalized Multi-Garments \ [Website]
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models \ [ICLR 2024] [Website] [Project] [Code]
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold \ [SIGGRAPH 2023] [Project] [Code]
Readout Guidance: Learning Control from Diffusion Features \ [CVPR 2024 Highlight] [Project] [Code]
FreeDrag: Feature Dragging for Reliable Point-based Image Editing \ [CVPR 2024] [Project] [Code]
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing \ [CVPR 2024] [Project] [Code]
InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos \ [Website] [Project] [Code]
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models \ [Website] [Project] [Code]
Repositioning the Subject within Image \ [Website] [Project] [Code]
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction \ [Website] [Project] [Code]
DragAnything: Motion Control for Anything using Entity Representation \ [Website] [Project] [Code]
InstantDrag: Improving Interactivity in Drag-based Image Editing \ [Website] [Project] [Code]
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing \ [CVPR 2024] [Code]
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation \ [CVPR 2024] [Code]
DragVideo: Interactive Drag-style Video Editing \ [ECCV 2024] [Code]
RotationDrag: Point-based Image Editing with Rotated Diffusion Features \ [Website] [Code]
TrackGo: A Flexible and Efficient Method for Controllable Video Generation \ [Website] [Project]
DragText: Rethinking Text Embedding in Point-based Image Editing \ [Website] [Project]
FastDrag: Manipulate Anything in One Step \ [Website] [Project]
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory \ [Website] [Project]
StableDrag: Stable Dragging for Point-based Image Editing \ [Website] [Project]
DiffUHaul: A Training-Free Method for Object Dragging in Images \ [Website] [Project]
RegionDrag: Fast Region-Based Image Editing with Diffusion Models \ [Website]
Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators \ [Website]
Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing \ [Website]
AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing \ [Website]
⭐⭐⭐Null-text Inversion for Editing Real Images using Guided Diffusion Models \ [CVPR 2023] [Website] [Project] [Code]
⭐⭐Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code \ [ICLR 2024] [Website] [Project] [Code]
⭐Inversion-Based Creativity Transfer with Diffusion Models \ [CVPR 2023] [Website] [Code]
⭐EDICT: Exact Diffusion Inversion via Coupled Transformations \ [CVPR 2023] [Website] [Code]
⭐Improving Negative-Prompt Inversion via Proximal Guidance \ [Website] [Code]
An Edit Friendly DDPM Noise Space: Inversion and Manipulations \ [CVPR 2024] [Project] [Code] [Demo]
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing \ [NeurIPS 2023] [Website] [Code]
Inversion-Free Image Editing with Natural Language \ [CVPR 2024] [Project] [Code]
LEDITS++: Limitless Image Editing using Text-to-Image Models \ [CVPR 2024] [Project] [Code]
Noise Map Guidance: Inversion with Spatial Context for Real Image Editing \ [ICLR 2024] [Website] [Code]
ReNoise: Real Image Inversion Through Iterative Noising \ [ECCV 2024] [Project] [Code]
IterInv: Iterative Inversion for Pixel-Level T2I Models \ [NeurIPS-W 2023] [Openreview] [NeuripsW] [Website] [Code]
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models \ [Website] [Project] [Code]
Object-aware Inversion and Reassembly for Image Editing \ [Website] [Project] [Code]
A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance \ [ICCV 2023] [Code]
Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models \ [ECCV 2024] [Code]
LocInv: Localization-aware Inversion for Text-Guided Image Editing \ [CVPR 2024 AI4CC workshop] [Code]
Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling \ [IJCAI 2024] [Code]
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing \ [Website] [Code]
Generating Non-Stationary Textures using Self-Rectification \ [Website] [Code]
Exact Diffusion Inversion via Bi-directional Integration Approximation \ [Website] [Code]
Fixed-point Inversion for Text-to-image diffusion models \ [Website] [Code]
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing \ [Website] [Code]
Effective Real Image Editing with Accelerated Iterative Diffusion Inversion \ [ICCV 2023 Oral] [Website]
BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models \ [NeurIPS 2024]
Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing \ [NeurIPS 2024]
BARET : Balanced Attention based Real image Editing driven by Target-text Inversion \ [WACV 2024]
Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing \ [ICASSP 2024]
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing \ [Website]
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations \ [Website]
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models \ [Website]
Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models \ [Website]
SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing \ [Website]
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models \ [Website]
KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing \ [Website]
Tuning-Free Inversion-Enhanced Control for Consistent Image Editing \ [Website]
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance \ [Website]
⭐⭐⭐Prompt-to-Prompt Image Editing with Cross Attention Control \ [ICLR 2023] [Website] [Project] [Code] [Replicate Demo]
⭐⭐⭐Zero-shot Image-to-Image Translation \ [SIGGRAPH 2023] [Project] [Code] [Replicate Demo] [Diffusers Doc] [Diffusers Code]
⭐⭐InstructPix2Pix: Learning to Follow Image Editing Instructions \ [CVPR 2023 (Highlight)] [Website] [Project] [Diffusers Doc] [Diffusers Code] [Official Code] [Dataset]
⭐⭐Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation \ [CVPR 2023] [Website] [Project] [Code] [Dataset] [Replicate Demo] [Demo]
⭐DiffEdit: Diffusion-based semantic image editing with mask guidance \ [ICLR 2023] [Website] [Unofficial Code] [Diffusers Doc] [Diffusers Code]
⭐Imagic: Text-Based Real Image Editing with Diffusion Models \ [CVPR 2023] [Website] [Project] [Diffusers]
⭐Inpaint Anything: Segment Anything Meets Image Inpainting \ [Website] [Code 1] [Code 2]
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing \ [ICCV 2023] [Website] [Project] [Code] [Demo]
Collaborative Score Distillation for Consistent Visual Synthesis \ [NeurIPS 2023] [Website] [Project] [Code]
Visual Instruction Inversion: Image Editing via Visual Prompting \ [NeurIPS 2023] [Website] [Project] [Code]
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models \ [NeurIPS 2023] [Website] [Code]
Localizing Object-level Shape Variations with Text-to-Image Diffusion Models \ [ICCV 2023] [Website] [Project] [Code]
Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance \ [Website] [Code1] [Code2] [Diffusers Code]
PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models \ [Website] [Project] [Code] [Demo]
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models \ [CVPR 2024] [Project] [Code]
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing \ [CVPR 2024] [Project] [Code]
Text-Driven Image Editing via Learnable Regions \ [CVPR 2024] [Project] [Code]
Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators \ [ICLR 2024] [Project] [Code]
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models \ [SIGGRAPH Asia 2024] [Project] [Code]
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps \ [NeurIPS 2024] [Project] [Code]
Zero-shot Image Editing with Reference Imitation \ [Website] [Project] [Code]
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision \ [Website] [Project] [Code]
MultiBooth: Towards Generating All Your Concepts in an Image from Text \ [Website] [Project] [Code]
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting \ [Website] [Project] [Code]
StyleBooth: Image Style Editing with Multimodal Instruction \ [Website] [Project] [Code]
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing \ [Website] [Project] [Code]
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods \ [Website] [Project] [Code]
InsightEdit: Towards Better Instruction Following for Image Editing \ [Website] [Project] [Code]
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions \ [Website] [Project] [Code]
MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path \ [Website] [Project] [Code]
HIVE: Harnessing Human Feedback for Instructional Visual Editing \ [Website] [Project] [Code]
FaceStudio: Put Your Face Everywhere in Seconds \ [Website] [Project] [Code]
Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach \ [Website] [Project] [Code]
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models \ [Website] [Project] [Code]
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction \ [Website] [Project] [Code]
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance \ [Website] [Project] [Code]
LIME: Localized Image Editing via Attention Regularization in Diffusion Models \ [Website] [Project] [Code]
MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond \ [Website] [Project] [Code]
MagicQuill: An Intelligent Interactive Image Editing System \ [Website] [Project] [Code]
Scaling Concept With Text-Guided Diffusion Models \ [Website] [Project] [Code]
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control \ [Website] [Project] [Code]
FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning \ [Website] [Project] [Code]
Delta Denoising Score \ [Website] [Project] [Code]
UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image \ [SIGGRAPH 2023] [Code]
Learning to Follow Object-Centric Image Editing Instructions Faithfully \ [EMNLP 2023] [Code]
GroupDiff: Diffusion-based Group Portrait Editing \ [ECCV 2024] [Code]
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing \ [CVPR 2024] [Code]
ZONE: Zero-Shot Instruction-Guided Local Editing \ [CVPR 2024] [Code]
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation \ [CVPR 2024] [Code]
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation \ [ECCV 2024] [Code]
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing \ [ECCV 2024] [Code]
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing \ [ECCV 2024] [Code]
Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks \ [AAAI 2024] [Code]
FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference \ [AAAI 2024] [Code]
Face Aging via Diffusion-based Editing\ [BMVC 2023] [Code]
FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing \ [Website] [Code]
Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing \ [Website] [Code]
PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing \ [Website] [Code]
DiT4Edit: Diffusion Transformer for Image Editing \ [Website] [Code]
Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing \ [Website] [Code]
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing \ [Website] [Code]
ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing \ [Website] [Code]
Differential Diffusion: Giving Each Pixel Its Strength \ [Website] [Code]
Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing \ [Website] [Code]
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks \ [Website] [Code]
Region-Aware Diffusion for Zero-shot Text-driven Image Editing \ [Website] [Code]
Forgedit: Text Guided Image Editing via Learning and Forgetting \ [Website] [Code]
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing \ [Website] [Code]
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control \ [Website] [Code]
FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models \ [Website] [Code]
Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance \ [Website] [Code]
SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing \ [Website] [Code]
PromptFix: You Prompt and We Fix the Photo \ [Website] [Code]
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation \ [Website] [Code]
Conditional Score Guidance for Text-Driven Image-to-Image Translation \ [NeurIPS 2023] [Website]
Emu Edit: Precise Image Editing via Recognition and Generation Tasks \ [CVPR 2024] [Project]
ByteEdit: Boost, Comply and Accelerate Generative Image Editing \ [ECCV 2024] [Project]
Watch Your Steps: Local Image and Scene Editing by Text Instructions \ [ECCV 2024] [Project]
TurboEdit: Instant text-based image editing \ [ECCV 2024] [Project]
Novel Object Synthesis via Adaptive Text-Image Harmony \ [NeurIPS 2024] [Project]
HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads \ [Website] [Project]
MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models \ [Website] [Project]
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models \ [Website] [Project]
SeedEdit: Align Image Re-Generation to Image Editing \ [Website] [Project]
Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection \ [Website] [Project]
Generative Image Layer Decomposition with Visual Effects \ [Website] [Project]
Editable Image Elements for Controllable Synthesis \ [Website] [Project]
SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing \ [Website] [Project]
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation \ [Website] [Project]
GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models \ [Website] [Project]
MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers \ [Website] [Project]
FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing \ [Website] [Project]
GeoDiffuser: Geometry-Based Image Editing with Diffusion Models \ [Website] [Project]
SOEDiff: Efficient Distillation for Small Object Editing \ [Website] [Project]
Click2Mask: Local Editing with Dynamic Mask Generation \ [Website] [Project]
Stable Flow: Vital Layers for Training-Free Image Editing \ [Website] [Project]
Iterative Multi-granular Image Editing using Diffusion Models \ [WACV 2024]
Text-to-image Editing by Image Information Removal \ [WACV 2024]
TexSliders: Diffusion-Based Texture Editing in CLIP Space \ [SIGGRAPH 2024]
Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models \ [CVPR 2023 AI4CC Workshop]
Learning Feature-Preserving Portrait Editing from Generated Pairs \ [Website]
EmoEdit: Evoking Emotions through Image Manipulation \ [Website]
DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images \ [Website]
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models \ [Website]
iEdit: Localised Text-guided Image Editing with Weak Supervision \ [Website]
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques \ [Website]
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing \ [Website]
PRedItOR: Text Guided Image Editing with Diffusion Prior \ [Website]
FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing \ [Website]
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing \ [Website]
Image Translation as Diffusion Visual Programmers \ [Website]
Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing \ [Website]
LoMOE: Localized Multi-Object Editing via Multi-Diffusion \ [Website]
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing \ [Website]
DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation \ [Website]
InstructGIE: Towards Generalizable Image Editing \ [Website]
LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing \ [Website]
Uncovering the Text Embedding in Text-to-Image Diffusion Models \ [Website]
Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer \ [Website]
Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion \ [Website]
Text Guided Image Editing with Automatic Concept Locating and Forgetting \ [Website]
The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP \ [Website]
LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing \ [Website]
Achieving Complex Image Edits via Function Aggregation with Diffusion Models \ [Website]
Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing \ [Website]
InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models \ [Website]
PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM \ [Website]
Augmentation-Driven Metric for Balancing Preservation and Modification in TextGuided Image Editing \ [Website]
Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing \ [Website]
ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing \ [Website]
ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models \ [Website]
ColorEdit: Training-free Image-Guided Color editing with diffusion model \ [Website]
GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter \ [Website]
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing \ [Website]
Pathways on the Image Manifold: Image Editing via Video Generation \ [Website]
RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models \ [CVPR 2023] [Website] [Project] [Code]
Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning \ [ECCV 2024 Oral] [Code]
How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? \ [NeurIPS 2024] [Code]
CLoG: Benchmarking Continual Learning of Image Generation Models \ [Website] [Code]
Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models \ [Website] [Code]
Continual Learning of Diffusion Models with Generative Distillation \ [Website] [Code]
Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental Learning \ [Website] [Code]
Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA \ [TMLR] [Project]
Assessing Open-world Forgetting in Generative Image Model Customization \ [Website] [Project]
Class-Incremental Learning using Diffusion Model for Distillation and Replay \ [ICCV 2023 VCL workshop best paper]
Create Your World: Lifelong Text-to-Image Diffusion \ [Website]
Low-Rank Continual Personalization of Diffusion Models \ [Website]
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models \ [Website]
Online Continual Learning of Video Diffusion Models From a Single Video Stream \ [Website]
Exploring Continual Learning of Diffusion Models \ [Website]
DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency \ [Website]
DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation \ [Website]
Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters \ [Website]
Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning \ [Website]
MuseumMaker: Continual Style Customization without Catastrophic Forgetting \ [Website]
Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion \ [Website]
Ablating Concepts in Text-to-Image Diffusion Models \ [ICCV 2023] [Website] [Project] [Code]
Erasing Concepts from Diffusion Models \ [ICCV 2023] [Website] [Project] [Code]
Paint by Inpaint: Learning to Add Image Objects by Removing Them First \ [Website] [Project] [Code]
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications \ [Website] [Project] [Code]
Editing Massive Concepts in Text-to-Image Diffusion Models \ [Website] [Project] [Code]
STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models \ [Website] [Project] [Code]
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models \ [ICML 2023 workshop] [Code]
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models \ [ECCV 2024] [Code]
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion \ [ECCV 2024] [Code]
Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation \ [NeurIPS 2024] [Code]
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts \ [Website] [Code]
ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion \ [Website] [Code]
Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models \ [Website] [Code]
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models \ [Website] [Code]
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning \ [Website] [Code]
Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models \ [Website] [Code]
Add-SD: Rational Generation without Manual Reference \ [Website] [Code]
RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining \ [Website] [Project]
MACE: Mass Concept Erasure in Diffusion Models \ [CVPR 2024]
Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models \ [Website]
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models \ [Website]
Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Model \ [Website]
Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning \ [Website]
Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models \ [Website]
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers \ [Website]
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models \ [Website]
EraseDiff: Erasing Data Influence in Diffusion Models \ [Website]
UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models \ [Website]
Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts \ [Website]
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model \ [Website]
Pruning for Robust Concept Erasing in Diffusion Models \ [Website]
Unlearning Concepts from Text-to-Video Diffusion Models \ [Website]
EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts \ [Website]
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning \ [Website]
Understanding the Impact of Negative Prompts: When and How Do They Take Effect? \ [Website]
Model Integrity when Unlearning with T2I Diffusion Models \ [Website]
⭐⭐⭐DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation \ [CVPR 2023 Honorable Mention] [Website] [Project] [Official Dataset] [Unofficial Code] [Diffusers Doc] [Diffusers Code]
⭐⭐⭐An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion \ [ICLR 2023 top-25%] [Website] [Diffusers Doc] [Diffusers Code] [Code]
⭐⭐Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion \ [CVPR 2023] [Website] [Project] [Diffusers Doc] [Diffusers Code] [Code]
⭐⭐ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement \ [ECCV 2024] [Project] [Code]
⭐⭐ReVersion: Diffusion-Based Relation Inversion from Images \ [Website] [Project] [Code]
⭐SINE: SINgle Image Editing with Text-to-Image Diffusion Models \ [CVPR 2023] [Website] [Project] [Code]
⭐Break-A-Scene: Extracting Multiple Concepts from a Single Image \ [SIGGRAPH Asia 2023] [Project] [Code]
⭐Concept Decomposition for Visual Exploration and Inspiration \ [SIGGRAPH Asia 2023] [Project] [Code]
Cones: Concept Neurons in Diffusion Models for Customized Generation \ [ICML 2023 Oral] [ICML 2023 Oral] [Website] [Code]
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing \ [NeurIPS 2023] [Website] [Project] [Code]
Inserting Anybody in Diffusion Models via Celeb Basis \ [NeurIPS 2023] [Website] [Project] [Code]
Controlling Text-to-Image Diffusion by Orthogonal Finetuning \ [NeurIPS 2023] [Website] [Project] [Code]
Photoswap: Personalized Subject Swapping in Images \ [NeurIPS 2023] [Website] [Project] [Code]
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models \ [NeurIPS 2023] [Website] [Project] [Code]
ITI-GEN: Inclusive Text-to-Image Generation \ [ICCV 2023 Oral] [Website] [Project] [Code]
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models \ [ICCV 2023] [Website] [Project] [Code]
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation \ [ICCV 2023 Oral] [Website] [Code]
A Neural Space-Time Representation for Text-to-Image Personalization \ [SIGGRAPH Asia 2023] [Project] [Code]
Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models \ [SIGGRAPH 2023] [Project] [Code]
Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation \ [NeurIPS 2023] [Website] [Code]
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction \ [ECCV 2024] [Project] [Code]
Face2Diffusion for Fast and Editable Face Personalization \ [CVPR 2024] [Project] [Code]
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models \ [CVPR 2024] [Project] [Code]
CapHuman: Capture Your Moments in Parallel Universes \ [CVPR 2024] [Project] [Code]
Style Aligned Image Generation via Shared Attention \ [CVPR 2024] [Project] [Code]
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition \ [CVPR 2024] [Project] [Code]
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization \ [CVPR 2024] [Project] [Code]
Material Palette: Extraction of Materials from a Single Image \ [CVPR 2024] [Project] [Code]
Learning Continuous 3D Words for Text-to-Image Generation \ [CVPR 2024] [Project] [Code]
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models \ [AAAI 2024] [Project] [Code]
The Hidden Language of Diffusion Models \ [ICLR 2024] [Project] [Code]
ZeST: Zero-Shot Material Transfer from a Single Image \ [ECCV 2024] [Project] [Code]
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization \ [Website] [Project] [Code]
MagicFace: Training-free Universal-Style Human Image Customized Synthesis \ [Website] [Project] [Code]
LCM-Lookahead for Encoder-based Text-to-Image Personalization \ [Website] [Project] [Code]
AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation \ [Website] [Project] [Code]
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance \ [Website] [Project] [Code]
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance \ [Website] [Project] [Code]
MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation \ [Website] [Project] [Code]
Customizing Text-to-Image Models with a Single Image Pair \ [Website] [Project] [Code]
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation \ [Website] [Project] [Code]
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving \ [Website] [Project] [Code]
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning \ [Website] [Project] [Code]
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models \ [Website] [Project] [Code]
Customizing Text-to-Image Diffusion with Camera Viewpoint Control \ [Website] [Project] [Code]
Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization \ [Website] [Project] [Code]
StyleDrop: Text-to-Image Generation in Any Style \ [Website] [Project] [Code]
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention \ [Website] [Project] [Code]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning \ [Website] [Project] [Code]
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning\ [Website] [Project] [Code]
Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion \ [Website] [Project] [Code]
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models \ [Website] [Project] [Code]
DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning \ [Website] [Project] [Code]
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing \ [Website] [Project] [Code]
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models \ [Website] [Project] [Code]
When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation \ [Website] [Project] [Code]
InstantID: Zero-shot Identity-Preserving Generation in Seconds \ [Website] [Project] [Code]
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding \ [Website] [Project] [Code]
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization \ [Website] [Project] [Code]
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models \ [Website] [Project] [Code]
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space \ [Website] [Project] [Code]
Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models \ [Website] [Project] [Code]
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition \ [Website] [Project] [Code]
StableIdentity: Inserting Anybody into Anywhere at First Sight \ [Website] [Project] [Code]
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model \ [Website] [Project] [Code]
Direct Consistency Optimization for Compositional Text-to-Image Personalization \ [Website] [Project] [Code]
TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder \ [Website] [Project] [Code]
EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance \ [Website] [Project] [Code]
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models \ [Website] [Project] [Code]
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation \ [Website] [Project] [Code]
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs \ [Website] [Project] [Code]
CSGO: Content-Style Composition in Text-to-Image Generation \ [Website] [Project] [Code]
DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models \ [NeurIPS 2024] [Code]
Customized Generation Reimagined: Fidelity and Editability Harmonized \ [ECCV 2024] [Code]
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning \ [ECCV 2024] [Code]
High-fidelity Person-centric Subject-to-Image Synthesis \ [CVPR 2024] [Code]
ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation \ [SIGGRAPH Asia 2023] [Code]
Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier \ [WACV 2025] [Code]
Multiresolution Textual Inversion \ [NeurIPS 2022 workshop] [Code]
Compositional Inversion for Stable Diffusion Models \ [AAAI 2024] [Code]
Decoupled Textual Embeddings for Customized Image Generation \ [AAAI 2024] [Code]
DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning \ [NeurIPS 2024] [Code]
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation \ [Website] [Code]
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation \ [Website] [Code]
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis \ [Website] [Code]
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance \ [Website] [Code]
PuLID: Pure and Lightning ID Customization via Contrastive Alignment \ [Website] [Code]
Cross Initialization for Personalized Text-to-Image Generation \ [Website] [Code]
Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach \ [Website] [Code]
SVDiff: Compact Parameter Space for Diffusion Fine-Tuning \ [Website] [Code]
ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation \ [Website] [Code]
AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image \ [Website] [Code]
A Closer Look at Parameter-Efficient Tuning in Diffusion Models \ [Website] [Code]
FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization \ [Website] [Code]
Controllable Textual Inversion for Personalized Text-to-Image Generation \ [Website] [Code]
Cross-domain Compositing with Pretrained Diffusion Models \ [Website] [Code]
Concept-centric Personalization with Large-scale Diffusion Priors \ [Website] [Code]
Customization Assistant for Text-to-image Generation \ [Website] [Code]
Cross Initialization for Personalized Text-to-Image Generation \ [Website] [Code]
Cones 2: Customizable Image Synthesis with Multiple Subjects \ [Website] [Code]
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models \ [Website] [Code]
AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization \ [Website] [Code]
CusConcept: Customized Visual Concept Decomposition with Diffusion Models \ [Website] [Code]
HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation \ [ECCV 2024] [Project]
Language-Informed Visual Concept Learning \ [ICLR 2024] [Project]
Key-Locked Rank One Editing for Text-to-Image Personalization \ [SIGGRAPH 2023] [Project]
Diffusion in Style \ [ICCV 2023] [Project]
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization \ [CVPR 2024] [Project]
RealCustom++: Representing Images as Real-Word for Real-Time Customization \ [Website] [Project]
Personalized Residuals for Concept-Driven Text-to-Image Generation \ [CVPR 2024] [Project]
LogoSticker: Inserting Logos into Diffusion Models for Customized Generation \ [ECCV 2024] [Project]
Diffusion Self-Distillation for Zero-Shot Customized Image Generation \ [Website] [Project]
RelationBooth: Towards Relation-Aware Customized Object Generation \ [Website] [Project]
InstructBooth: Instruction-following Personalized Text-to-Image Generation \ [Website] [Project]
AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation \ [Website] [Project]
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation \ [Website] [Project]
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization \ [Website] [Project]
Subject-driven Text-to-Image Generation via Apprenticeship Learning \ [Website] [Project]
Orthogonal Adaptation for Modular Customization of Diffusion Models \ [Website] [Project]
Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation \ [Website] [Project]
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models \ [Website] [Project]
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models \ [Website] [Project]
$P+$: Extended Textual Conditioning in Text-to-Image Generation \ [Website] [Project]
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models \ [Website] [Project]
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning \ [Website] [Project]
Total Selfie: Generating Full-Body Selfies \ [Website] [Project]
PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation \ [Website] [Project]
DreamTuner: Single Image is Enough for Subject-Driven Generation \ [Website] [Project]
PALP: Prompt Aligned Personalization of Text-to-Image Models \ [Website] [Project]
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion \ [CVPR 2024] [Project]
Visual Style Prompting with Swapping Self-Attention \ [Website] [Project]
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm \ [Website] [Project]
Non-confusing Generation of Customized Concepts in Diffusion Models \ [Website] [Project]
HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation \ [Website] [Project]
Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models \ [NeurIPS 2024]
ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image \ [ECCV 2024]
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models \ [CVPR 2024]
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation \ [CVPR 2024]
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models \ [AAAI 2024]
FreeTuner: Any Subject in Any Style with Training-free Diffusion \ [Website]
Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling Augmentation Framework \ [Website]
InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser \ [Website]
DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation \ [Website]
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models \ [Website]
Gradient-Free Textual Inversion \ [Website]
Identity Encoder for Personalized Diffusion \ [Website]
Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation \ [Website]
ELODIN: Naming Concepts in Embedding Spaces \ [Website]
Generate Anything Anywhere in Any Scene \ [Website]
Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model \ [Website]
Face0: Instantaneously Conditioning a Text-to-Image Model on a Face \ [Website]
MagiCapture: High-Resolution Multi-Concept Portrait Customization \ [Website]
A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization \ [Website]
DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics \ [Website]
An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis \ [Website]
Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models \ [Website]
Memory-Efficient Personalization using Quantized Diffusion Model \ [Website]
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models \ [Website]
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization \ [Website]
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding \ [Website]
SeFi-IDE: Semantic-Fidelity Identity Embedding for Personalized Diffusion-Based Generation \ [Website]
Visual Concept-driven Image Generation with Text-to-Image Diffusion Model \ [Website]
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models \ [Website]
MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration \ [Website]
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation \ [Website]
OneActor: Consistent Character Generation via Cluster-Conditioned Guidance \ [Website]
StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models \ [Website]
Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks \ [Website]
Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter \ [Website]
PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction \ [Website]
AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models \ [Website]
Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation \ [Website]
PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control \ [Website]
MagicID: Flexible ID Fidelity Generation System \ [Website]
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization \ [Website]
ArtiFade: Learning to Generate High-quality Subject from Blemished Images \ [Website]
CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization \ [Website]
Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis \ [Website]
Event-Customized Image Generation \ [Website]
LEARNING TO CUSTOMIZE TEXT-TO-IMAGE DIFFUSION IN DIVERSE CONTEXT \ [Website]
HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects \ [Website]
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator \ [Website]
Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency \ [Website]
⭐⭐⭐Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models \ [SIGGRAPH 2023] [Project] [Official Code] [Diffusers Code] [Diffusers doc] [Replicate Demo]
SEGA: Instructing Diffusion using Semantic Dimensions \ [NeurIPS 2023] [Website] [Code] [Diffusers Code] [Diffusers Doc]
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance \ [ICCV 2023] [Website] [Project] [Code Official] [Diffusers Doc] [Diffusers Code]
Expressive Text-to-Image Generation with Rich Text \ [ICCV 2023] [Website] [Project] [Code] [Demo]
Editing Implicit Assumptions in Text-to-Image Diffusion Models \ [ICCV 2023] [Website] [Project] [Code] [Demo]
ElasticDiffusion: Training-free Arbitrary Size Image Generation \ [CVPR 2024] [Project] [Code] [Demo]
MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models \ [ICCV 2023] [Website] [Project] [Code]
Discriminative Class Tokens for Text-to-Image Diffusion Models \ [ICCV 2023] [Website] [Project] [Code]
Compositional Visual Generation with Composable Diffusion Models \ [ECCV 2022] [Website] [Project] [Code]
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models \ [ICCV 2023] [Project] [Code] [Blog]
Diffusion Self-Guidance for Controllable Image Generation \ [NeurIPS 2023] [Website] [Project] [Code]
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation \ [NeurIPS 2023] [Website] [Code]
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models \ [NeurIPS 2023] [Website] [Code]
Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment \ [NeurIPS 2023] [Website] [Code]
DemoFusion: Democratising High-Resolution Image Generation With No $$$ \ [CVPR 2024] [Project] [Code]
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation \ [CVPR 2024] [Project] [Code]
Training Diffusion Models with Reinforcement Learning \ [ICLR 2024] [Project] [Code]
Divide & Bind Your Attention for Improved Generative Semantic Nursing\ [BMVC 2023 Oral] [Project] [Code]
Make It Count: Text-to-Image Generation with an Accurate Number of Objects \ [Website] [Project] [Code]
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction \ [Website] [Project] [Code]
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference \ [Website] [Project] [Code]
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step \ [Website] [Project] [Code]
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation \ [Website] [Project] [Code]
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts \ [Website] [Project] [Code]
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching \ [Website] [Project] [Code]
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions \ [Website] [Project] [Code]
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance \ [Website] [Project] [Code]
Real-World Image Variation by Aligning Diffusion Inversion Chain \ [Website] [Project] [Code]
FreeU: Free Lunch in Diffusion U-Net \ [Website] [Project] [Code]
ConceptLab: Creative Generation using Diffusion Prior Constraints \ [Website] [Project] [Code]
Aligning Text-to-Image Diffusion Models with Reward Backpropagationn \ [Website] [Project] [Code]
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models \ [Website] [Project] [Code]
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models \ [Website] [Project] [Code]
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls \ [Website] [Project] [Code]
TokenCompose: Grounding Diffusion with Token-level Supervision\ [Website] [Project] [Code]
DiffusionGPT: LLM-Driven Text-to-Image Generation System \ [Website] [Project] [Code]
Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models \ [Website] [Project] [Code]
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support \ [Website] [Project] [Code]
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations \ [Website] [Project] [Code]
MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion \ [Website] [Project] [Code]
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models \ [Website] [Project] [Code]
Stylus: Automatic Adapter Selection for Diffusion Models \ [Website] [Project] [Code]
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models \ [Website] [Project] [Code]
Iterative Object Count Optimization for Text-to-image Diffusion Models \ [Website] [Project] [Code]
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment \ [Website] [Project] [Code]
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts \ [Website] [Project] [Code]
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis \ [Website] [Project] [Code]
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation \ [Website] [Project] [Code]
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models \ [ACM MM 2023 Oral] [Code]
Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models \ [ICLR 2024] [Code]
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis \ [NeurIPS 2024] [Code]
Dynamic Prompt Optimizing for Text-to-Image Generation \ [CVPR 2024] [Code]
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models \ [CVPR 2024] [Code]
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance \ [CVPR 2024] [Code]
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization \ [CVPR 2024] [Code]
Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models \ [ECCV 2024] [Code]
On Discrete Prompt Optimization for Diffusion Models \ [ICML 2024] [Code]
Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function \ [NeurIPS 2024] [Code]
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization \ [ACM MM 2024] [Code]
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models \ [NeurIPS 2023] [Code]
Diffusion Model Alignment Using Direct Preference Optimization \ [Website] [Code]
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment \ [Website] [Code]
Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models \ [Website] [Code]
Progressive Compositionality In Text-to-Image Generative Models \ [Website] [Code]
Improving Long-Text Alignment for Text-to-Image Diffusion Models \ [Website] [Code]
Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization \ [Website] [Code]
RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images \ [Website] [Code]
Aggregation of Multi Diffusion Models for Enhancing Learned Representations \ [Website] [Code]
AID: Attention Interpolation of Text-to-Image Diffusion \ [Website] [Code]
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance \ [Website] [Code]
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis \ [Website] [Code]
ORES: Open-vocabulary Responsible Visual Synthesis \ [Website] [Code]
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness \ [Website] [Code]
Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models \ [Website] [Code]
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation \ [Website] [Code]
InstructG2I: Synthesizing Images from Multimodal Attributed Graphs \ [Website] [Code]
Detector Guidance for Multi-Object Text-to-Image Generation \ [Website] [Code]
Designing a Better Asymmetric VQGAN for StableDiffusion \ [Website] [Code]
FABRIC: Personalizing Diffusion Models with Iterative Feedback \ [Website] [Code]
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models \ [Website] [Code]
Progressive Text-to-Image Diffusion with Soft Latent Direction \ [Website] [Code]
Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy \ [Website] [Code]
TraDiffusion: Trajectory-Based Training-Free Image Generation \ [Website] [Code]
If at First You Don’t Succeed, Try, Try Again:Faithful Diffusion-based Text-to-Image Generation by Selection \ [Website] [Code]
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts \ [Website] [Code]
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs \ [Website] [Code]
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning \ [Website] [Code]
AltDiffusion: A Multilingual Text-to-Image Diffusion Model \ [Website] [Code]
It is all about where you start: Text-to-image generation with seed selection \ [Website] [Code]
End-to-End Diffusion Latent Optimization Improves Classifier Guidance \ [Website] [Code]
Correcting Diffusion Generation through Resampling \ [Website] [Code]
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs \ [Website] [Code]
Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation \ [Website] [Code]
A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis \ [Website] [Code]
PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement \ [Website] [Code]
Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models \ [Website] [Code]
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation \ [Website] [Code]
Aligning Few-Step Diffusion Models with Dense Reward Difference Learning \ [Website] [Code]
LightIt: Illumination Modeling and Control for Diffusion Models \ [CVPR 2024] [Project]
Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis \ [NeurIPS 2024] [Project]
Scalable Ranked Preference Optimization for Text-to-Image Generation \ [Website] [Project]
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation \ [Website] [Project]
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation \ [Website] [Project]
RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance \ [Website] [Project]
UniFL: Improve Stable Diffusion via Unified Feedback Learning \ [Website] [Project]
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting \ [Website] [Project]
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation \ [Website] [Project]
Semantic Guidance Tuning for Text-To-Image Diffusion Models \ [Website] [Project]
Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation \ [Website] [Project]
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation \ [Website] [Project]
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation \ [Website] [Project]
FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes \ [Website] [Project]
Lazy Diffusion Transformer for Interactive Image Editing \ [Website] [Project]
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis \ [Website] [Project]
Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models \ [Website] [Project]
Norm-guided latent space exploration for text-to-image generation \ [NeurIPS 2023] [Website]
Improving Diffusion-Based Image Synthesis with Context Prediction \ [NeurIPS 2023] [Website]
GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections \ [ECCV 2024]
MultiGen: Zero-shot Image Generation from Multi-modal Prompt \ [ECCV 2024]
On Mechanistic Knowledge Localization in Text-to-Image Generative Models \ [ICML 2024]
Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation \ [NeurIPS 2024]
Generating Compositional Scenes via Text-to-image RGBA Instance Generation \ [NeurIPS 2024]
A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization \ [Website]
PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation \ [Website]
Exposure Diffusion: HDR Image Generation by Consistent LDR denoising \ [Website]
Information Theoretic Text-to-Image Alignment \ [Website]
Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers \ [Website]
Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control \ [Website]
Aligning Diffusion Models by Optimizing Human Utility \ [Website]
Instruct-Imagen: Image Generation with Multi-modal Instruction \ [Website]
CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models \ [Website]
MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask \ [Website]
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images \ [Website]
Text2Layer: Layered Image Generation using Latent Diffusion Model \ [Website]
Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling \ [Website]
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation \ [Website]
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion \ [Website]
Improving Compositional Text-to-image Generation with Large Vision-Language Models \ [Website]
Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else \ [Website]
Unseen Image Synthesis with Diffusion Models \ [Website]
AnyLens: A Generative Diffusion Model with Any Rendering Lens \ [Website]
Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering \ [Website]
Text2Street: Controllable Text-to-image Generation for Street Views \ [Website]
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation \ [Website]
Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Model \ [Website]
Debiasing Text-to-Image Diffusion Models \ [Website]
Stochastic Conditional Diffusion Models for Semantic Image Synthesis \ [Website]
Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion \ [Website]
Transparent Image Layer Diffusion using Latent Transparency \ [Website]
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation \ [Website]
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances \ [Website]
StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models \ [Website]
Make Me Happier: Evoking Emotions Through Image Diffusion Models \ [Website]
Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model \ [Website]
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model \ [Website]
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation \ [Website]
U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models \ [Website]
ECNet: Effective Controllable Text-to-Image Diffusion Models \ [Website]
TextCraftor: Your Text Encoder Can be Image Quality Controller \ [Website]
Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding \ [Website]
Towards Better Text-to-Image Generation Alignment via Attention Modulation \ [Website]
Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model \ [Website]
SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance \ [Website]
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance \ [Website]
Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models \ [Website]
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting \ [Website]
Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models \ [Website]
SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation \ [Website]
Training-Free Sketch-Guided Diffusion with Latent Optimization \ [Website]
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization \ [Website]
Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models \ [Website]
Training-free Diffusion Model Alignment with Sampling Demons \ [Website]
MinorityPrompt: Text to Minority Image Generation via Prompt Optimization \ [Website]
AUTOMATED FILTERING OF HUMAN FEEDBACK DATA FOR ALIGNING TEXT-TO-IMAGE DIFFUSION MODELS \ [Website]
Saliency Guided Optimization of Diffusion Latents \ [Website]
Preference Optimization with Multi-Sample Comparisons \ [Website]
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning \ [Website]
Redefining
Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation \ [Website]
Improving image synthesis with diffusion-negative sampling \ [Website]
Golden Noise for Diffusion Models: A Learning Framework \ [Website]
Test-time Conditional Text-to-Image Synthesis Using Diffusion Models \ [Website]
Decoupling Training-Free Guided Diffusion by ADMM \ [Website]
Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps \ [Website]
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis \ [Website]
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model \ [Website]
Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory \ [Website]
CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis \ [Website]
Reward Incremental Learning in Text-to-Image Generation \ [Website]
MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation \ [ICML 2023] [ICML 2023] [Website] [Project] [Code] [Diffusers Code] [Diffusers Doc] [Replicate Demo]
SceneComposer: Any-Level Semantic Image Synthesis \ [CVPR 2023 Highlight] [Website] [Project] [Code]
GLIGEN: Open-Set Grounded Text-to-Image Generation \ [CVPR 2023] [Website] [Code] [Demo]
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis \ [ICLR 2023] [Website] [Project] [Code]
Visual Programming for Text-to-Image Generation and Evaluation \ [NeurIPS 2023] [Website] [Project] [Code]
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation \ [ICLR 2024] [Website] [Project] [Code]
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation \ [NeurIPS 2024] [Project] [Code]
ReCo: Region-Controlled Text-to-Image Generation \ [CVPR 2023] [Website] [Code]
Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis \ [ICCV 2023] [Website] [Code]
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion \ [ICCV 2023] [Website] [Code]
Dense Text-to-Image Generation with Attention Modulation \ [ICCV 2023] [Website] [Code]
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models \ [Website] [Project] [Code] [Demo] [Blog]
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control \ [CVPR 2024] [Code] [Project]
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis \ [CVPR 2024] [Project] [Code]
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language \ [Website] [Project] [Code]
Training-Free Layout Control with Cross-Attention Guidance \ [Website] [Project] [Code]
ROICtrl: Boosting Instance Control for Visual Generation \ [Website] [Project] [Code]
Directed Diffusion: Direct Control of Object Placement through Attention Guidance \ [Website] [Project] [Code]
Grounded Text-to-Image Synthesis with Attention Refocusing \ [Website] [Project] [Code]
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers \ [Website] [Project] [Code]
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation \ [Website] [Project] [Code]
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models \ [Website] [Project] [Code]
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation \ [Website] [Project] [Code]
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition \ [Website] [Project] [Code]
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models \ [Website] [Project] [Code]
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following \ [Website] [Project] [Code]
InstanceDiffusion: Instance-level Control for Image Generation \ [Website] [Project] [Code]
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis \ [CVPR 2024] [Code]
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging \ [CVPR 2024] [Code]
Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation \ [Website] [Code]
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation \ [Website] [Code]
Enhancing Object Coherence in Layout-to-Image Synthesis \ [Website] [Code]
Training-free Regional Prompting for Diffusion Transformers \ [Website] [Code]
DivCon: Divide and Conquer for Progressive Text-to-Image Generation \ [Website] [Code]
RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models \ [Website] [Code]
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control \ [Website] [Code]
HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation \ [Website] [Code]
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis \ [ECCV 2024] [Project]
ReCorD: Reasoning and Correcting Diffusion for HOI Generation \ [ACM MM 2024] [Project]
Compositional Text-to-Image Generation with Dense Blob Representations \ [Website] [Project]
GroundingBooth: Grounding Text-to-Image Customization \ [Website] [Project]
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation \ [Website] [Project]
ReGround: Improving Textual and Spatial Grounding at No Cost \ [Website] [Project]
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception \ [CVPR 2024]
Guided Image Synthesis via Initial Image Editing in Diffusion Model \ [ACM MM 2023]
Training-free Composite Scene Generation for Layout-to-Image Synthesis \ [ECCV 2024]
LSReGen: Large-Scale Regional Generator via Backward Guidance Framework \ [Website]
Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion \ [Website]
Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching \ [Website]
Boundary Attention Constrained Zero-Shot Layout-To-Image Generation \ [Website]
Enhancing Image Layout Control with Loss-Guided Diffusion Models \ [Website]
GLoD: Composing Global Contexts and Local Details in Image Generation \ [Website]
A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis \ [Website]
Controllable Text-to-Image Generation with GPT-4 \ [Website]
Localized Text-to-Image Generation for Free via Cross Attention Control \ [Website]
Training-Free Location-Aware Text-to-Image Synthesis \ [Website]
Composite Diffusion | whole >= \Sigma parts \ [Website]
Continuous Layout Editing of Single Images with Diffusion Models \ [Website]
Zero-shot spatial layout conditioning for text-to-image diffusion models \ [Website]
Obtaining Favorable Layouts for Multiple Object Generation \ [Website]
LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis\ [Website]
Self-correcting LLM-controlled Diffusion Models \ [Website]
Joint Generative Modeling of Scene Graphs and Images via Diffusion Models \ [Website]
Spatial-Aware Latent Initialization for Controllable Image Generation \ [Website]
Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control \ [Website]
ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation \ [Website]
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise \ [Website]
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis \ [Website]
SpotActor: Training-Free Layout-Controlled Consistent Image Generation \ [Website]
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation \ [Website]
Scribble-Guided Diffusion for Training-free Text-to-Image Generation \ [Website]
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation \ [Website]
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement \ [Website]
⭐⭐⭐SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations \ [ICLR 2022] [Website] [Project] [Code]
⭐⭐⭐DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation \ [CVPR 2022] [Website] [Code]
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation \ [NeurIPS 2023] [Website] [Project] [Code]
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations \ [CVPR 2024] [Project] [Code]
Diffusion-based Image Translation using Disentangled Style and Content Representation \ [ICLR 2023] [Website] [Code]
FlexIT: Towards Flexible Semantic Image Translation \ [CVPR 2022] [Website] [Code]
Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer \ [ICCV 2023] [Website] [Code]
E2GAN: Efficient Training of Efficient GANs for Image-to-Image Translation \ [ICML 2024] [Project] [Code]
Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models \ [Website] [Project] [Code]
Cross-Image Attention for Zero-Shot Appearance Transfer \ [Website] [Project] [Code]
FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models \ [Website] [Project] [Code]
Diffusion Guided Domain Adaptation of Image Generators \ [Website] [Project] [Code]
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models \ [Website] [Project] [Code]
FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models \ [Website] [Project] [Code]
FilterPrompt: Guiding Image Transfer in Diffusion Models \ [Website] [Project] [Code]
Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization \ [ECCV 2024] [Code]
One-Shot Structure-Aware Stylized Image Synthesis \ [CVPR 2024] [Code]
BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models \ [CVPR 2023] [Code]
Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile \ [AAAI 2024] [Code]
Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation \ [AAAI 2024] [Code]
ZePo: Zero-Shot Portrait Stylization with Faster Sampling \ [ACM MM 2024] [Code]
DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer \ [ACM MM Asia 2024] [Code]
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control \ [Website] [Code]
Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance \ [Website] [Code]
Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis \ [Website] [Code]
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions \ [Website] [Code]
GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis \ [Website] [Code]
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion \ [Website] [Code]
PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering \ [Website] [Code]
One-Step Image Translation with Text-to-Image Models \ [Website] [Code]
D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods \ [Website] [Code]
StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models \ [ICCV 2023] [Website]
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors \ [ACM MM 2023]
High-Fidelity Diffusion-based Image Editing \ [AAAI 2024]
EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models \ [ECCV 2024]
Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer \ [Website]
UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators \ [Website]
Regularized Distribution Matching Distillation for One-step Unpaired Image-to-Image Translation \ [Website]
TEXTOC: Text-driven Object-Centric Style Transfer \ [Website]
Seed-to-Seed: Image Translation in Diffusion Seed Space \ [Website]
Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation \ [Website]
Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation \ [Website]
odise: open-vocabulary panoptic segmentation with text-to-image diffusion modelss \ [CVPR 2023 Highlight] [Project] [Code] [Demo]
LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation \ [ICCV 2023] [Website] [Project] [Code]
Text-Image Alignment for Diffusion-Based Perception \ [CVPR 2024] [Website] [Project] [Code]
Stochastic Segmentation with Conditional Categorical Diffusion Models\ [ICCV 2023] [Website] [Code]
DDP: Diffusion Model for Dense Visual Prediction\ [ICCV 2023] [Website] [Code]
DiffusionDet: Diffusion Model for Object Detection \ [ICCV 2023] [Website] [Code]
OVTrack: Open-Vocabulary Multiple Object Tracking \ [CVPR 2023] [Website] [Project]
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process \ [NeurIPS 2023] [Website] [Code]
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction \ [CVPR 2024] [Project] [Code]
Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features \ [Website] [Project] [Code]
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion \ [Website] [Project] [Code]
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset \ [Website] [Project] [Code]
InvSeg: Test-Time Prompt Inversion for Semantic Segmentation \ [Website] [Project] [Code]
SMITE: Segment Me In TimE \ [Website] [Project] [Code]
Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation \ [NeurIPS 2024] [Code]
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model \ [ECCV 2024] [Code]
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model \ [Website] [Code]
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow \ [Website] [Code]
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking \ [Website] [Code]
Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models \ [Website] [Code]
Scribble Hides Class: Promoting Scribble-Based Weakly-Supervised Semantic Segmentation with Its Class Label \ [Website] [Code]
Personalize Segment Anything Model with One Shot \ [Website] [Code]
DiffusionTrack: Diffusion Model For Multi-Object Tracking \ [Website] [Code]
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation \ [Website] [Code]
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting \ [Website] [Code]
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation \ [Website] [Code]
UniGS: Unified Representation for Image Generation and Segmentation \ [Website] [Code]
Placing Objects in Context via Inpainting for Out-of-distribution Segmentation\ [Website] [Code]
MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation \ [Website] [Code]
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation \ [Website] [Code]
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models \ [Website] [Code]
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models \ [ICLR 2024] [Website] [Project]
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation \ [CVPR 2024] [Project]
FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models \ [Website] [Project]
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos \ [Website] [Project]
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models \ [Website] [Project]
Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation \ [ICCV 2023] [Website]
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection \ [CVPR 2024]
Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers \ [ECCV 2024]
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation \ [NeurIPS 2024]
Generalization by Adaptation: Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation \ [WACV 2024]
Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis \ [ACCV 2024]
A Simple Background Augmentation Method for Object Detection with Diffusion Model \ [Website]
Unveiling the Power of Diffusion Features For Personalized Segmentation and Retrieval \ [Website]
SLiMe: Segment Like Me \ [Website]
ASAM: Boosting Segment Anything Model with Adversarial Tuning \ [Website]
Diffusion Features to Bridge Domain Gap for Semantic Segmentation \ [Website]
MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation \ [Website]
DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery \ [Website]
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models \ [Website]
Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter \ [Website]
Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion \ [Website]
From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models \ [Website]
Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation \ [Website]
Patch-based Selection and Refinement for Early Object Detection \ [Website]
TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models \ [Website]
Towards Granularity-adjusted Pixel-level Semantic Annotation \ [Website]
Gen2Det: Generate to Detect \ [Website]
Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors \ [Website]
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model \ [Website]
Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection \ [Website]
Generative Edge Detection with Stable Diffusion \ [Website]
DINTR: Tracking via Diffusion-based Interpolation \ [Website]
Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking \ [Website]
DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability \ [Website]
Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation \ [Website]
⭐⭐⭐Adding Conditional Control to Text-to-Image Diffusion Models \ [ICCV 2023 best paper] [Website] [Official Code] [Diffusers Doc] [Diffusers Code]
⭐⭐T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models \ [Website] [Official Code] [Diffusers Code]
SketchKnitter: Vectorized Sketch Generation with Diffusion Models \ [ICLR 2023 Spotlight] [ICLR 2023 Spotlight] [Website] [Code]
Freestyle Layout-to-Image Synthesis \ [CVPR 2023 highlight] [Website] [Project] [Code]
Collaborative Diffusion for Multi-Modal Face Generation and Editing \ [CVPR 2023] [Website] [Project] [Code]
HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation \ [ICCV 2023] [Website] [Project] [Code]
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model \ [ICCV 2023] [Website] [Code]
Sketch-Guided Text-to-Image Diffusion Models \ [SIGGRAPH 2023] [Project] [Code]
Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive \ [ICLR 2024] [Project] [Code]
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts \ [Website] [Project] [Code]
ControlNeXt: Powerful and Efficient Control for Image and Video Generation \ [Website] [Project] [Code]
Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance \ [Website] [Project] [Code]
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model \ [Website] [Project] [Code]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models \ [Website] [Project] [Code]
A Simple Approach to Unifying Diffusion-based Conditional Generation \ [Website] [Project] [Code]
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion \ [Website] [Project] [Code]
Late-Constraint Diffusion Guidance for Controllable Image Synthesis \ [Website] [Project] [Code]
Composer: Creative and controllable image synthesis with composable conditions \ [Website] [Project] [Code]
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models \ [Website] [Project] [Code]
Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation \ [Website] [Project] [Code]
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild \ [Website] [Project] [Code]
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models \ [Website] [Project] [Code]
LooseControl: Lifting ControlNet for Generalized Depth Conditioning \ [Website] [Project] [Code]
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model \ [Website] [Project] [Code]
ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models \ [Website] [Project] [Code]
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet \ [Website] [Project] [Code]
SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior \ [Website] [Project] [Code]
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis \ [ICLR 2024] [Code]
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models \ [CVPR 2024] [Code]
CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation \ [Website] [Code]
Universal Guidance for Diffusion Models \ [Website] [Code]
Late-Constraint Diffusion Guidance for Controllable Image Synthesis \ [Website] [Code]
Meta ControlNet: Enhancing Task Adaptation via Meta Learning \ [Website] [Code]
Local Conditional Controlling for Text-to-Image Diffusion Models \ [Website] [Code]
KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models \ [Website] [Code]
OminiControl: Minimal and Universal Control for Diffusion Transformer \ [Website] [Code]
Modulating Pretrained Diffusion Models for Multimodal Image Synthesis \ [SIGGRAPH 2023] [Project]
SpaText: Spatio-Textual Representation for Controllable Image Generation\ [CVPR 2023] [Project]
CCM: Adding Conditional Controls to Text-to-Image Consistency Models \ [ICML 2024] [Project]
Dreamguider: Improved Training free Diffusion-based Conditional Generation \ [Website] [Project]
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback \ [Website] [Project]
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation \ [Website] [Project]
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion \ [Website] [Project]
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection \ [Website] [Project]
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor \ [Website] [Project]
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing \ [Website] [Project]
CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models \ [Website] [Project]
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation \ [Website] [Project]
Sketch-Guided Scene Image Generation \ [Website]
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation \ [Website]
Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation \ [Website]
Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt \ [Website]
Adding 3D Geometry Control to Diffusion Models \ [Website]
LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation \ [Website]
JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling \ [Website]
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet \ [Website]
Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons \ [Website]
Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt \ [Website]
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation \ [Website]
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation \ [Website]
Label-free Neural Semantic Image Synthesis \ [Website]
Discriminative Diffusion Models as Few-shot Vision and Language Learners \ [Website] [Code]
Few-Shot Diffusion Models \ [Website] [Code]
Few-shot Semantic Image Synthesis with Class Affinity Transfer \ [CVPR 2023] [Website]
DiffAlign : Few-shot learning using diffusion based synthesis and alignment \ [Website]
Few-shot Image Generation with Diffusion Models \ [Website]
Lafite2: Few-shot Text-to-Image Generation \ [Website]
Few-Shot Task Learning through Inverse Generative Modeling \ [Website]
Paint by Example: Exemplar-based Image Editing with Diffusion Models \ [CVPR 2023] [Website] [Code] [Diffusers Doc] [Diffusers Code]
GLIDE: Towards photorealistic image generation and editing with text-guided diffusion model \ [ICML 2022 Spotlight] [Website] [Code]
Blended Diffusion for Text-driven Editing of Natural Images \ [CVPR 2022] [Website] [Project] [Code]
Blended Latent Diffusion \ [SIGGRAPH 2023] [Project] [Code]
TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition \ [ICCV 2023] [Website] [Project] [Code]
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting \ [CVPR 2023] [Website] [Code]
Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models \ [ICML 2023] [Website] [Code]
Coherent and Multi-modality Image Inpainting via Latent Space Optimization \ [Website] [Project] [Code]
Inst-Inpaint: Instructing to Remove Objects with Diffusion Models \ [Website] [Project] [Code] [Demo]
Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting \ [Website] [Project] [Code]
CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models \ [Website] [Project] [Code]
AnyDoor: Zero-shot Object-level Image Customization \ [Website] [Project] [Code]
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting \ [Website] [Project] [Code]
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation \ [Website] [Project] [Code]
Towards Language-Driven Video Inpainting via Multimodal Large Language Models \ [Website] [Project] [Code]
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections \ [Website] [Project] [Code]
Improving Text-guided Object Inpainting with Semantic Pre-inpainting\ [ECCV 2024] [Code]
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior \ [ECCV 2024] [Code]
360-Degree Panorama Generation from Few Unregistered NFoV Images \ [ACM MM 2023] [Code]
Delving Globally into Texture and Structure for Image Inpainting\ [ACM MM 2022] [Code]
ControlEdit: A MultiModal Local Clothing Image Editing Method \ [Website] [Code]
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting \ [Website] [Code]
Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing \ [Website] [Code]
What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer \ [Website] [Code]
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model \ [Website] [Code]
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting \ [Website] [Code]
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model \ [Website] [Code]
Image Inpainting via Iteratively Decoupled Probabilistic Modeling \ [Website] [Code]
ControlCom: Controllable Image Composition using Diffusion Model \ [Website] [Code]
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model \ [Website] [Code]
MAGICREMOVER: TUNING-FREE TEXT-GUIDED IMAGE INPAINTING WITH DIFFUSION MODELS \ [Website] [Code]
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models \ [Website] [Code]
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion \ [Website] [Code]
Sketch-guided Image Inpainting with Partial Discrete Diffusion Process \ [Website] [Code]
ReMOVE: A Reference-free Metric for Object Erasure \ [Website] [Code]
Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting \ [Website] [Code]
MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior \ [Website] [Code]
AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes \ [ECCV 2024] [Project]
Text2Place: Affordance-aware Text Guided Human Placement \ [ECCV 2024] [Project]
IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation \ [CVPR 2024] [Project]
Matting by Generation \ [SIGGRAPH 2024] [Project]
PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference \ [NeurIPS 2024] [Project]
Taming Latent Diffusion Model for Neural Radiance Field Inpainting \ [Website] [Project]
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control \ [Website] [Project]
Towards Stable and Faithful Inpainting \ [Website] [Project]
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos \ [Website] [Project]
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion \ [Website] [Project]
TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization \ [ACM MM 2024]
Semantically Consistent Video Inpainting with Conditional Diffusion Models \ [Website]
Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention\ [Website]
Outline-Guided Object Inpainting with Diffusion Models \ [Website]
SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model \ [Website]
Gradpaint: Gradient-Guided Inpainting with Diffusion Models \ [Website]
Infusion: Internal Diffusion for Video Inpainting \ [Website]
Rethinking Referring Object Removal \ [Website]
Tuning-Free Image Customization with Image and Text Guidance \ [Website]
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model \ [Website]
FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image \ [Website]
InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture \ [Website]
Thinking Outside the BBox: Unconstrained Generative Object Compositing \ [Website]
Content-aware Tile Generation using Exterior Boundary Inpainting \ [Website]
AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status \ [Website]
TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning \ [Website]
MagicEraser: Erasing Any Objects via Semantics-Aware Control \ [Website]
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation \ [CVPR 2023] [Website] [Project] [Code]
Desigen: A Pipeline for Controllable Design Template Generation \ [CVPR 2024] [Project] [Code]
DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer \ [ICCV 2023] [Website] [Code]
LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models \ [ICCV 2023] [Website] [Code]
Desigen: A Pipeline for Controllable Design Template Generation \ [CVPR 2024] [Code]
LayoutDM: Transformer-based Diffusion Model for Layout Generation \ [CVPR 2023] [Website]
Unifying Layout Generation with a Decoupled Diffusion Model \ [CVPR 2023] [Website]
PLay: Parametrically Conditioned Layout Generation using Latent Diffusion \ [ICML 2023] [Website]
Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints \ [ICLR 2024]
CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model \ [Website]
Diffusion-based Document Layout Generation \ [Website]
Dolfin: Diffusion Layout Transformers without Autoencoder \ [Website]
LayoutFlow: Flow Matching for Layout Generation \ [Website]
Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model \ [Website]
⭐⭐TextDiffuser: Diffusion Models as Text Painters \ [NeurIPS 2023] [Website] [Project] [Code]
⭐⭐TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering \ [ECCV 2024 Oral] [Project] [Code]
GlyphControl: Glyph Conditional Control for Visual Text Generation \ [NeurIPS 2023] [Website] [Code]
DiffUTE: Universal Text Editing Diffusion Model \ [NeurIPS 2023] [Website] [Code]
Word-As-Image for Semantic Typography \ [SIGGRAPH 2023] [Project] [Code]
Kinetic Typography Diffusion Model \ [ECCV 2024] [Project] [Code]
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior \ [Website] [Project] [Code]
JoyType: A Robust Design for Multilingual Visual Text Creation \ [Website] [Project] [Code]
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models \ [Website] [Project] [Code]
One-Shot Diffusion Mimicker for Handwritten Text Generation \ [ECCV 2024] [Code]
DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution \ [ECCV 2024] [Code]
HFH-Font: Few-shot Chinese Font Synthesis with Higher Quality, Faster Speed, and Higher Resolution \ [SIGGRAPH Asia 2024] [Code]
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model \ [AAAI 2024] [Code]
FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning \ [AAAI 2024] [Code]
Text Image Inpainting via Global Structure-Guided Diffusion Models \ [AAAI 2024] [Code]
Ambigram generation by a diffusion model \ [ICDAR 2023] [Code]
Scene Text Image Super-resolution based on Text-conditional Diffusion Models \ [WACV 2024] [Code]
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling \ [ECCV 2024] [Code]
First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending \ [ECAI 2024] [Code]
VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models \ [Website] [Code]
Visual Text Generation in the Wild \ [Website] [Code]
Deciphering Oracle Bone Language with Diffusion Models \ [Website] [Code]
High Fidelity Scene Text Synthesis \ [Website] [Code]
AnyText: Multilingual Visual Text Generation And Editing \ [Website] [Code]
AnyText2: Visual Text Generation and Editing With Customizable Attributes \ [Website] [Code]
Few-shot Calligraphy Style Learning \ [Website] [Code]
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models \ [Website] [Code]
DiffusionPen: Towards Controlling the Style of Handwritten Text Generation \ [Website] [Code]
AmbiGen: Generating Ambigrams from Pre-trained Diffusion Model \ [Website] [Project]
UniVG: Towards UNIfied-modal Video Generation \ [Website] [Project]
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation \ [Website] [Project]
DECDM: Document Enhancement using Cycle-Consistent Diffusion Models \ [WACV 2024]
SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models \ [Website]
AnyTrans: Translate AnyText in the Image with Large Scale Models \ [Website]
ARTIST: Improving the Generation of Text-rich Images by Disentanglement \ [Website]
Improving Text Generation on Images with Synthetic Captions \ [Website]
CustomText: Customized Textual Image Generation using Diffusion Models \ [Website]
VecFusion: Vector Font Generation with Diffusion \ [Website]
Typographic Text Generation with Off-the-Shelf Diffusion Model \ [Website]
Font Style Interpolation with Diffusion Models \ [Website]
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation \ [Website]
DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation \ [Website]
CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction \ [Website]
Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models \ [Website]
Text Image Generation for Low-Resource Languages with Dual Translation Learning \ [Website]
Decoupling Layout from Glyph in Online Chinese Handwriting Generation \ [Website]
Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training \ [Website]
TextMaster: Universal Controllable Text Edit \ [Website]
Towards Visual Text Design Transfer Across Languages \ [Website]
DiffSTR: Controlled Diffusion Models for Scene Text Removal \ [Website]
TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images \ [Website]
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models \ [Website]
Conditional Text-to-Image Generation with Reference Guidance \ [Website]
Type-R: Automatically Retouching Typos for Text-to-Image Generation \ [Website]
ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting \ [NeurIPS 2023 spotlight] [Website] [Project] [Code]
Image Super-Resolution via Iterative Refinement \ [TPAMI] [Website] [Project] [Code]
DiffIR: Efficient Diffusion Model for Image Restoration\ [ICCV 2023] [Website] [Code]
Kalman-Inspired Feature Propagation for Video Face Super-Resolution \ [ECCV 2024] [Project] [Code]
AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation \ [Website] [Project] [Code]
Exploiting Diffusion Prior for Real-World Image Super-Resolution \ [Website] [Project] [Code]
SinSR: Diffusion-Based Image Super-Resolution in a Single Step \ [CVPR 2024] [Code]
CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution \ [CVPR 2024] [Code]
Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs \ [NeurIPS 2024] [Code]
SeeClear: Semantic Distillation Enhances Pixel Condensation for Video Super-Resolution \ [NeurIPS 2024] [Code]
Iterative Token Evaluation and Refinement for Real-World Super-Resolution \ [AAAI 2024] [Code]
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution \ [Website] [Code]
Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution \ [Website] [Code]
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors \ [Website] [Code]
One Step Diffusion-based Super-Resolution with Time-Aware Distillation \ [Website] [Code]
One-Step Effective Diffusion Network for Real-World Image Super-Resolution \ [Website] [Code]
Binarized Diffusion Model for Image Super-Resolution \ [Website] [Code]
Does Diffusion Beat GAN in Image Super Resolution? \ [Website] [Code]
PatchScaler: An Efficient Patch-independent Diffusion Model for Super-Resolution \ [Website] [Code]
DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion \ [Website] [Code]
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach \ [Website] [Code]
Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization \ [Website] [Code]
DSR-Diff: Depth Map Super-Resolution with Diffusion Model \ [Website] [Code]
SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution \ [Website] [Code]
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution \ [Website] [Code]
Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution \ [Website] [Code]
BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution \ [Website] [Code]
HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models\ [ICCV 2023] [Website]
Text-guided Explorable Image Super-resolution \ [CVPR 2024]
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder \ [CVPR 2024]
AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution \ [CVPR 2024]
Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network \ [AAAI 2024]
Detail-Enhancing Framework for Reference-Based Image Super-Resolution \ [Website]
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation \ [Website]
Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution \ [Website]
Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models \ [Website]
YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution \ [Website]
Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution -- a Non-Denoising Model \ [Website]
TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution \ [Website]
ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution \ [Website]
Image Super-Resolution with Text Prompt Diffusio \ [Website]
DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution \ [Website]
DREAM: Diffusion Rectification and Estimation-Adaptive Models \ [Website]
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution \ [Website]
Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution \ [Website]
CasSR: Activating Image Power for Real-World Image Super-Resolution \ [Website]
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution \ [Website]
Frequency-Domain Refinement with Multiscale Diffusion for Super Resolution \ [Website]
ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer \ [Website]
Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution \ [Website]
Adversarial Diffusion Compression for Real-World Image Super-Resolution \ [Website]
HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution \ [Website]
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators \ [ICCV 2023 Oral] [Website] [Project] [Code]
SinFusion: Training Diffusion Models on a Single Image or Video \ [ICML 2023] [Website] [Project] [Code]
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models \ [CVPR 2023] [Website] [Project] [Code]
ZIGMA: A DiT-style Zigzag Mamba Diffusion Model \ [ECCV 2024] [Project] [Code]
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation \ [NeurIPS 2022] [Website] [Project] [Code]
GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER \ [NeurIPS 2023] [Website] [Code]
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator \ [NeurIPS 2023] [Website] [Code]
Conditional Image-to-Video Generation with Latent Flow Diffusion Models \ [CVPR 2023] [Website] [Code]
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation \ [CVPR 2023] [Project] [Code]
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models \ [CVPR 2024] [Project] [Code]
Video Diffusion Models \ [ICLR 2022 workshop] [Website] [Code] [Project]
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models \ [Website] [Diffusers Doc] [Project] [Code]
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation \ [ECCV 2024] [Project] [Code]
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions \ [ECCV 2024] [Project] [Code]
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design \ [Website] [Project] [Code]
Tora: Trajectory-oriented Diffusion Transformer for Video Generation \ [Website] [Project] [Code]
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling \ [Website] [Project] [Code]
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence \ [Website] [Project] [Code]
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation \ [Website] [Project] [Code]
Video Diffusion Alignment via Reward Gradients \ [Website] [Project] [Code]
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models \ [Website] [Project] [Code]
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models \ [Website] [Project] [Code]
TVG: A Training-free Transition Video Generation Method with Diffusion Models \ [Website] [Project] [Code]
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement \ [Website] [Project] [Code]
CamI2V: Camera-Controlled Image-to-Video Diffusion Model \ [Website] [Project] [Code]
Identity-Preserving Text-to-Video Generation by Frequency Decomposition \ [Website] [Project] [Code]
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning \ [Website] [Project] [Code]
MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning \ [Website] [Project] [Code]
MotionClone: Training-Free Motion Cloning for Controllable Video Generation \ [Website] [Project] [Code]
StableAnimator: High-Quality Identity-Preserving Human Image Animation \ [Website] [Project] [Code]
AnimateAnything: Consistent and Controllable Animation for Video Generation \ [Website] [Project] [Code]
GameGen-X: Interactive Open-world Game Video Generation \ [Website] [Project] [Code]
VEnhancer: Generative Space-Time Enhancement for Video Generation \ [Website] [Project] [Code]
SF-V: Single Forward Video Generation Model \ [Website] [Project] [Code]
Pyramidal Flow Matching for Efficient Video Generative Modeling \ [Website] [Project] [Code]
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation \ [Website] [Project] [Code]
CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion \ [Website] [Project] [Code]
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation \ [Website] [Project] [Code]
VideoTetris: Towards Compositional Text-to-Video Generation \ [Website] [Project] [Code]
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback \ [Website] [Project] [Code]
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation \ [Website] [Project] [Code]
MotionBooth: Motion-Aware Customized Text-to-Video Generation \ [Website] [Project] [Code]
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model \ [Website] [Project] [Code]
MotionDreamer: Zero-Shot 3D Mesh Animation from Video Diffusion Models \ [Website] [Project] [Code]
MotionCraft: Physics-based Zero-Shot Video Generation \ [Website] [Project] [Code]
MotionMaster: Training-free Camera Motion Transfer For Video Generation \ [Website] [Project] [Code]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets \ [Website] [Project] [Code]
Motion Inversion for Video Customization \ [Website] [Project] [Code]
MagicAvatar: Multimodal Avatar Generation and Animation \ [Website] [Project] [Code]
Progressive Autoregressive Video Diffusion Models \ [Website] [Project] [Code]
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation \ [Website] [Project] [Code]
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos \ [Website] [Project] [Code]
Breathing Life Into Sketches Using Text-to-Video Priors \ [Website] [Project] [Code]
Latent Video Diffusion Models for High-Fidelity Long Video Generation \ [Website] [Project] [Code]
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance \ [Website] [Project] [Code]
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising \ [Website] [Project] [Code]
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models \ [Website] [Project] [Code]
VideoComposer: Compositional Video Synthesis with Motion Controllability \ [Website] [Project] [Code]
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion \ [Website] [Project] [Code]
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models \ [Website] [Project] [Code]
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation \ [Website] [Project] [Code]
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation \ [Website] [Project] [Code]
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer \ [Website] [Project] [Code]
LLM-GROUNDED VIDEO DIFFUSION MODELS \ [Website] [Project] [Code]
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling \ [Website] [Project] [Code]
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation \ [Website] [Project] [Code]
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models \ [Website] [Project] [Code]
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning \ [Website] [Project] [Code]
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models \ [Website] [Project] [Code]
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline \ [Website] [Project] [Code]
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation \ [Website] [Project] [Code]
ART⋅V: Auto-Regressive Text-to-Video Generation with Diffusion Models \ [Website] [Project] [Code]
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax \ [Website] [Project] [Code]
VideoBooth: Diffusion-based Video Generation with Image Prompts \ [Website] [Project] [Code]
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model \ [Website] [Project] [Code]
LivePhoto: Real Image Animation with Text-guided Motion Control \ [Website] [Project] [Code]
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators \ [Website] [Project] [Code]
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion \ [Website] [Project] [Code]
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation \ [Website] [Project] [Code]
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models \ [Website] [Project] [Code]
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution \ [Website] [Project] [Code]
FreeInit: Bridging Initialization Gap in Video Diffusion Models \ [Website] [Project] [Code]
Text2AC-Zero: Consistent Synthesis of Animated Characters using 2D Diffusion \ [Website] [Project] [Code]
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter \ [Website] [Project] [Code]
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos \ [Website] [Project] [Code]
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis \ [Website] [Project] [Code]
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions \ [Website] [Project] [Code]
Latte: Latent Diffusion Transformer for Video Generation \ [Website] [Project] [Code]
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens \ [Website] [Project] [Code]
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models \ [Website] [Project] [Code]
Towards A Better Metric for Text-to-Video Generation \ [Website] [Project] [Code]
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models \ [Website] [Project] [Code]
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning \ [Website] [Project] [Code]
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation \ [Website] [Project] [Code]
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control \ [Website] [Project] [Code]
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models \ [Website] [Project] [Code]
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation \ [Website] [Project] [Code]
Optical-Flow Guided Prompt Optimization for Coherent Video Generation \ [Website] [Project] [Code]
FlexiFilm: Long Video Generation with Flexible Conditions \ [Website] [Project] [Code]
FIFO-Diffusion: Generating Infinite Videos from Text without Training \ [Website] [Project] [Code]
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation \ [Website] [Project] [Code]
CV-VAE: A Compatible Video VAE for Latent Generative Video Models \ [Website] [Project] [Code]
MVOC: a training-free multiple video object composition method with diffusion models \ [Website] [Project] [Code]
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model \ [Website] [Project] [Code]
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model \ [Website] [Project] [Code]
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation \ [Website] [Project] [Code]
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction \ [Website] [Project] [Code]
AMG: Avatar Motion Guided Video Generation \ [Website] [Project] [Code]
DiVE: DiT-based Video Generation with Enhanced Control \ [Website] [Project] [Code]
MegActor-Σ: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer \ [Website] [Project] [Code]
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers \ [ICLR 2023] [Code]
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer \ [Website] [Code]
Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing \ [ICLR 2024] [Code]
SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces \ [ICLR 2024] [Code]
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing \ [Website] [Code]
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach \ [Website] [Code]
Real-Time Video Generation with Pyramid Attention Broadcast \ [Website] [Code]
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model \ [Website] [Code]
Diffusion Probabilistic Modeling for Video Generation \ [Website] [Code]
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors \ [Website] [Code]
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation \ [Website] [Code]
STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction \ [Website] [Code]
Vlogger: Make Your Dream A Vlog \ [Website] [Code]
Magic-Me: Identity-Specific Video Customized Diffusion \ [Website] [Code]
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models \ [Website] [Code]
EchoReel: Enhancing Action Generation of Existing Video Diffusion Models \ [Website] [Code]
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text \ [Website] [Code]
TAVGBench: Benchmarking Text to Audible-Video Generation \ [Website] [Code]
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model \ [Website] [Code]
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations \ [Website] [Code]
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis \ [Website] [Code]
REDUCIO! Generating 1024×1024 Video within 16 Seconds using Extremely Compressed Motion Latents \ [Website] [Code]
MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling \ [Website] [Code]
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model \ [Website] [Code]
HARIVO: Harnessing Text-to-Image Models for Video Generation [ECCV 2024] [Project]
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners \ [CVPR 2024] [Project]
AtomoVideo: High Fidelity Image-to-Video Generation \ [CVPR 2024] [Project]
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition \ [ICLR 2024] [Project]
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models \ [CVPR 2024] [Project]
ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model \ [ECCV 2024] [Project]
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models \ [ECCV 2024] [Project]
Training-free Long Video Generation with Chain of Diffusion Model Experts \ [Website] [Project]
Free2Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Model \ [Website] [Project]
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention \ [Website] [Project]
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation \ [Website] [Project]
Hierarchical Patch Diffusion Models for High-Resolution Video Generation \ [Website] [Project]
I4VGen: Image as Stepping Stone for Text-to-Video Generation \ [Website] [Project]
FrameBridge: Improving Image-to-Video Generation with Bridge Models \ [Website] [Project]
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale \ [Website] [Project]
Boosting Camera Motion Control for Video Diffusion Transformers \ [Website] [Project]
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation \ [Website] [Project]
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control \ [Website] [Project]
Controllable Longer Image Animation with Diffusion Models \ [Website] [Project]
AniClipart: Clipart Animation with Text-to-Video Priors \ [Website] [Project]
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models \ [Website] [Project]
TimeRewind: Rewinding Time with Image-and-Events Video Diffusion \ [Website] [Project]
VideoPoet: A Large Language Model for Zero-Shot Video Generation \ [Website] [Project]
PEEKABOO: Interactive Video Generation via Masked-Diffusion\ [Website] [Project]
Searching Priors Makes Text-to-Video Synthesis Better \ [Website] [Project]
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation \ [Website] [Project]
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning \ [Website] [Project]
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models \ [Website] [Project]
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation \ [Website] [Project]
Imagen Video: High Definition Video Generation with Diffusion Models \ [Website] [Project]
MoVideo: Motion-Aware Video Generation with Diffusion Models \ [Website] [Project]
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer \ [Website] [Project]
Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning \ [Website] [Project]
VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model \ [Website] [Project]
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation \ [Website] [Project]
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models \ [Website] [Project]
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation \ [Website] [Project]
Customizing Motion in Text-to-Video Diffusion Models \ [Website] [Project]
Photorealistic Video Generation with Diffusion Models \ [Website] [Project]
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control \ [Website] [Project]
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM \ [Website] [Project]
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models \ [Website] [Project]
ActAnywhere: Subject-Aware Video Background Generation \ [Website] [Project]
Lumiere: A Space-Time Diffusion Model for Video Generation \ [Website] [Project]
InstructVideo: Instructing Video Diffusion Models with Human Feedback \ [Website] [Project]
Boximator: Generating Rich and Controllable Motions for Video Synthesis \ [Website] [Project]
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion \ [Website] [Project]
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation \ [Website] [Project]
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation \ [Website] [Project]
Audio-Synchronized Visual Animation \ [Website] [Project]
I2VControl: Disentangled and Unified Video Motion Synthesis Control \ [Website] [Project]
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis \ [Website] [Project]
S2DM: Sector-Shaped Diffusion Models for Video Generation \ [Website] [Project]
AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment \ [Website] [Project]
Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation \ [Website] [Project]
Dance Any Beat: Blending Beats with Visuals in Dance Video Generation \ [Website] [Project]
PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control \ [Website] [Project]
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer \ [Website] [Project]
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation \ [Website] [Project]
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance \ [Website] [Project]
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities \ [Website] [Project]
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention \ [Website] [Project]
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide \ [Website] [Project]
MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis \ [Website] [Project]
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation \ [Website] [Project]
Improved Video VAE for Latent Video Diffusion Model \ [Website] [Project]
DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships \ [ACM MM 2024 Oral]
Grid Diffusion Models for Text-to-Video Generation \ [Website]
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction \ [Website]
GenRec: Unifying Video Generation and Recognition with Diffusion Models \ [Website]
Dual-Stream Diffusion Net for Text-to-Video Generation \ [Website]
DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control \ [Website]
SimDA: Simple Diffusion Adapter for Efficient Video Generation \ [Website]
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation \ [Website]
Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models \ [Website]
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation \ [Website]
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation \ [Website]
Optimal Noise pursuit for Augmenting Text-to-Video Generation \ [Website]
Make Pixels Dance: High-Dynamic Video Generation \ [Website]
Video-Infinity: Distributed Long Video Generation \ [Website]
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning \ [Website]
Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion \ [Website]
Decouple Content and Motion for Conditional Image-to-Video Generation \ [Website]
X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention \ [Website]
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis \ [Website]
MTVG : Multi-text Video Generation with Text-to-Video Models \ [Website]
VideoLCM: Video Latent Consistency Model \ [Website]
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion \ [Website]
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation \ [Website]
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models \ [Website]
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model \ [Website]
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects \ [Website]
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation \ [Website]
Training-Free Semantic Video Composition via Pre-trained Diffusion Model \ [Website]
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling \ [Website]
Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models \ [Website]
Human Video Translation via Query Warping \ [Website]
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation \ [Website]
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis \ [Website]
Context-aware Talking Face Video Generation \ [Website]
Pix2Gif: Motion-Guided Diffusion for GIF Generation \ [Website]
Intention-driven Ego-to-Exo Video Generation \ [Website]
AnimateDiff-Lightning: Cross-Model Diffusion Distillation \ [Website]
Frame by Familiar Frame: Understanding Replication in Video Diffusion Models \ [Website]
Matten: Video Generation with Mamba-Attention \ [Website]
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models \ [Website]
ReVideo: Remake a Video with Motion and Content Control \ [Website]
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation \ [Website]
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model \ [Website]
GVDIFF: Grounded Text-to-Video Generation with Diffusion Models \ [Website]
Mobius: An High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task \ [Website]
Contrastive Sequential-Diffusion Learning: An approach to Multi-Scene Instructional Video Synthesis \ [Website]
Multi-sentence Video Grounding for Long Video Generation \ [Website]
Fine-gained Zero-shot Video Sampling \ [Website]
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data \ [Website]
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations \ [Website]
EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation \ [Website]
Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation \ [Website]
One-Shot Learning Meets Depth Diffusion in Multi-Object Videos \ [Website]
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation \ [Website]
S2AG-Vid: Enhancing Multi-Motion Alignment in Video Diffusion Models via Spatial and Syntactic Attention-Based Guidance \ [Website]
JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation \ [Website]
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning \ [Website]
COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation \ [Website]
Noise Crystallization and Liquid Noise: Zero-shot Video Generation using Image Diffusion Models \ [Website]
BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way \ [Website]
LumiSculpt: A Consistency Lighting Control Network for Video Generation \ [Website]
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation \ [Website]
OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models \ [Website]
Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge \ [Website]
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input \ [Website]
StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart \ [Website]
VIRES: Video Instance Repainting with Sketch and Text Guidance \ [Website]
MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation \ [Website]
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing \ [ICCV 2023 Oral] [Website] [Project] [Code]
Text2LIVE: Text-Driven Layered Image and Video Editing \ [ECCV 2022 Oral] [Project] [code]
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding \ [CVPR 2023] [Project] [Code]
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation \ [ICCV 2023] [Project] [Code]
StableVideo: Text-driven Consistency-aware Diffusion Video Editing \ [ICCV 2023] [Website] [Code]
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models \ [ECCV 2024] [Project] [Code]
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing \ [Website] [Project] [Code]
Video-P2P: Video Editing with Cross-attention Control \ [Website] [Project] [Code]
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing \ [Website] [Project] [Code]
MagicEdit: High-Fidelity and Temporally Coherent Video Editing\ [Website] [Project] [Code]
TokenFlow: Consistent Diffusion Features for Consistent Video Editing \ [Website] [Project] [Code]
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing \ [Website] [Project] [Code]
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts \ [Website] [Project] [Code]
MotionDirector: Motion Customization of Text-to-Video Diffusion Models \ [Website] [Project] [Code]
EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing \ [Website] [Project] [Code]
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models\ [Website] [Project] [Code]
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models\ [Website] [Project] [Code]
MotionEditor: Editing Video Motion via Content-Aware Diffusion \ [Website] [Project] [Code]
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models \ [Website] [Project] [Code]
MagicStick: Controllable Video Editing via Control Handle Transformations \ [Website] [Project] [Code]
VidToMe: Video Token Merging for Zero-Shot Video Editing \ [Website] [Project] [Code]
VASE: Object-Centric Appearance and Shape Manipulation of Real Videos \ [Website] [Project] [Code]
Neural Video Fields Editing \ [Website] [Project] [Code]
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing \ [Website] [Project] [Code]
MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion \ [Website] [Project] [Code]
Vid2Vid-zero: Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models \ [Website] [Code]
DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization \ [Website] [Code]
LOVECon: Text-driven Training-Free Long Video Editing with ControlNet \ [Website] [Code]
Pix2video: Video Editing Using Image Diffusion \ [Website] [Code]
E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment \ [Website] [Code]
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer\ [Website] [Code]
Flow-Guided Diffusion for Video Inpainting \ [Website] [Code]
Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models \ [Website] [Code]
Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing \ [Website] [Code]
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing \ [Website] [Code]
Shape-Aware Text-Driven Layered Video Editing \ [CVPR 2023] [Website] [Project]
VideoDirector: Precise Video Editing via Text-to-Video Models \ [Website] [Project]
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing \ [Website] [Project]
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices \ [Website] [Project]
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing \ [Website] [Project]
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models \ [Website] [Project]
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing \ [Website] [Project]
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing \ [Website] [Project]
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence \ [Website] [Project]
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation \ [Website] [Project]
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning \ [Website] [Project]
WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing \ [ECCV 2024] [Project]
MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance \ [Website] [Project]
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models \ [Website] [Project]
DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing \ [Website] [Project]
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing \ [Website] [Project]
DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency \ [ECCV 2024]
Edit Temporal-Consistent Videos with Image Diffusion Model \ [Website]
Streaming Video Diffusion: Online Video Editing with Diffusion Models \ [Website]
Cut-and-Paste: Subject-Driven Video Editing with Attention Control \ [Website]
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation \ [Website]
Dreamix: Video Diffusion Models Are General Video Editors \ [Website]
Towards Consistent Video Editing with Text-to-Image Diffusion Models \ [Website]
EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints \ [Website]
CCEdit: Creative and Controllable Video Editing via Diffusion Models \ [Website]
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models \ [Website]
FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier \ [Website]
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models \ [Website]
RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing \ [Website]
Object-Centric Diffusion for Efficient Video Editing \ [Website]
FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing \ [Website]
Video Editing via Factorized Diffusion Distillation \ [Website]
EffiVED:Efficient Video Editing via Text-instruction Diffusion Models \ [Website]
Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion \ [Website]
GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models \ [Website]
Temporally Consistent Object Editing in Videos using Extended Attention \ [Website]
Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting \ [Website]
FRAG: Frequency Adapting Group for Diffusion Video Editing \ [Website]
InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models \ [Website]
Text-based Talking Video Editing with Cascaded Conditional Diffusion \ [Website]
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion \ [Website]
Blended Latent Diffusion under Attention Control for Real-World Video Editing \ [Website]
EditBoard: Towards A Comprehensive Evaluation Benchmark for Text-based Video Editing Models \ [Website]
DNI: Dilutional Noise Initialization for Diffusion Video Editing \ [Website]
FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing \ [Website]
Replace Anyone in Videos \ [Website]
Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing \ [Website]