Any reason for not using vae.std to generate RGB latent?

prs-eth / Marigold

[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

https://marigoldmonodepth.github.io

Apache License 2.0

2.25k stars 124 forks source link

Any reason for not using vae.std to generate RGB latent? #86

Closed Jerrypiglet closed 3 months ago

Jerrypiglet commented 3 months ago

Hi there, I notice the pipeline of Marigold is different from the original SD pipeline in that, when encoding RGB/depth into latents, instead of doing something like vae.encode(pixel_values).latent_dist.sample(), the implementation uses function def encode_rgb where only the mean of the vae distribution is used as latent (see here), without adding std * sample as in here. Is there a reason for this, to remove the randomness of the vae encoding?

markkua commented 3 months ago

Hi, as we are trying to do depth estimation, which is supposed to be deterministic unlike generation tasks. We intentionally did this to remove undesired randomness.