wl-zhao / VPD

[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
https://vpd.ivg-research.xyz
MIT License
502 stars 30 forks source link

How to remove text information in denoising UNet based on the existing code? #63

Open RuiTianHIT opened 3 months ago

RuiTianHIT commented 3 months ago

@wl-zhao

Dear author! We are interested in your high-quality and excellent work. We want to explore the ability of depth estimation when the model does not introduce text information. As shown in the image.

20240620

However, we forced self.conditioning key == None, and an error occurred during this process. Does the author have any good solution? Thank you very much, I look forward to your reply! 1 2

wl-zhao commented 3 months ago

You can just feed a zero text embedding to the UNet