prs-eth / Marigold

[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
https://marigoldmonodepth.github.io
Apache License 2.0
2.02k stars 99 forks source link

Train a ControlNet plugin instead of full-scale fine-tuning? #71

Open Darkbblue opened 1 month ago

Darkbblue commented 1 month ago

This work is very inspiring and exciting. Marigold makes huge progress in discriminative diffusion models by showing that general-purpose pre-training can benefit later fine-tuning for discrimination, so that we no longer train discriminative diffusion models from scratches. Now the problem is the FULL-SCALE fine-tuning. In fact there are alternative ways in generative diffusion models. For example, ControlNet keeps the backbone U-Net frozen and trains a plugin instead, where the plugin can toggle the behavior of the backbone to certain purposes. This approach is more efficient and more flexible. So I wonder if you can train a plugin-Marigold with all the other settings unchanged? If this approach can be demonstrated feasible (or even infeasible), the community can get very useful insights.

markkua commented 1 month ago

Thanks for your suggestions. ControlNet-based method is indeed an interesting topic to be investigated. However, it's beyond our current plan.

The training code has been released. You are welcome to adapt to ContolNet and contribute.

Discussions along this line are also welcome in this issue.