Train a ControlNet plugin instead of full-scale fine-tuning?

prs-eth / Marigold

[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Apache License 2.0

2.02k stars 99 forks source link

This work is very inspiring and exciting. Marigold makes huge progress in discriminative diffusion models by showing that general-purpose pre-training can benefit later fine-tuning for discrimination, so that we no longer train discriminative diffusion models from scratches. Now the problem is the FULL-SCALE fine-tuning. In fact there are alternative ways in generative diffusion models. For example, ControlNet keeps the backbone U-Net frozen and trains a plugin instead, where the plugin can toggle the behavior of the backbone to certain purposes. This approach is more efficient and more flexible. So I wonder if you can train a plugin-Marigold with all the other settings unchanged? If this approach can be demonstrated feasible (or even infeasible), the community can get very useful insights.

prs-eth / Marigold

Train a ControlNet plugin instead of full-scale fine-tuning? #71