prs-eth / Marigold

[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
https://marigoldmonodepth.github.io
Apache License 2.0
2.36k stars 132 forks source link

Is it possible to improve the model's performance by switching the base model to SDXL? #132

Open kke19 opened 2 days ago

kke19 commented 2 days ago

Hi,

First of all, thank you for your great work!

During my testing, I encountered the need for higher resolution depth images. I noticed that the SDXL model has a much higher resolution compared to the SD2 model. I would like to ask if it’s possible to improve the overall model's performance at higher resolutions by replacing the pre-trained base model from SD2 to SDXL.

Looking forward to your thoughts!

nandometzger commented 1 day ago

Hi, thank you for the question.

I did try it at some point. I can confirm that it is technically possible to replace SD2 with SDXL as a backbone. However, I cannot comment on the performance gains, since we did not have large enough GPU's to properly train it. I susspect that it will indeed work.

Hope that helps. Nando