wl-zhao / VPD

[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
https://vpd.ivg-research.xyz
MIT License
509 stars 31 forks source link

About cross attention in depth estimation #41

Open minhohihi opened 1 year ago

minhohihi commented 1 year ago

Hello, Thank you for sharing interesting work.

Did you use cross attention map when training depth estimation?

the code below, cross attention is disabled in depth estimation. https://github.com/wl-zhao/VPD/blob/main/depth/models_depth/model.py#L57

When set 'use_attn' to True, runtime error is occurred cause of not matched channel size. Could you confirm my understanding? Please correct if needed.

Thank you.

MonsterWonder commented 10 months ago

Same question! Have you solved it?