Hi there, I notice the pipeline of Marigold is different from the original SD pipeline in that, when encoding RGB/depth into latents, instead of doing something like vae.encode(pixel_values).latent_dist.sample(), the implementation uses function def encode_rgb where only the mean of the vae distribution is used as latent (see here), without adding std * sample as in here. Is there a reason for this, to remove the randomness of the vae encoding?
Hi, as we are trying to do depth estimation, which is supposed to be deterministic unlike generation tasks. We intentionally did this to remove undesired randomness.
Hi there, I notice the pipeline of Marigold is different from the original SD pipeline in that, when encoding RGB/depth into latents, instead of doing something like
vae.encode(pixel_values).latent_dist.sample()
, the implementation uses functiondef encode_rgb
where only the mean of the vae distribution is used as latent (see here), without addingstd * sample
as in here. Is there a reason for this, to remove the randomness of the vae encoding?