In pipelines for img2img, inpainting and other tasks using images as inputs (such as StableDiffusionImg2ImgPipeline), image inputs would be encoded into a gaussian distribution by pipeline.vae.encode(), then pipelines would use retrieve_latents() function to sample a deterministic latent tensor from this distribution. retrieve_latents() offers parameter sample_mode to select the method of sampling latent tensor.
Argument sample_mode="argmax" is supposed to mean that $$latents = \mathop{argmax}\limits_{x}\ P_z(x)$$
,where $P_z$ is the probility of encoded gaussian distribution, which means latents is the mean of the distribution in this case. Therefore the origin codes should be corrected to:
[ ] Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
[ ] Did you build and run the code without any errors?
[ ] Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
[ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
What does this PR do?
Fix incorrect calling for
vae.diag_gauss_dist
In pipelines for img2img, inpainting and other tasks using images as inputs (such as
StableDiffusionImg2ImgPipeline
), image inputs would be encoded into a gaussian distribution bypipeline.vae.encode()
, then pipelines would useretrieve_latents()
function to sample a deterministic latent tensor from this distribution.retrieve_latents()
offers parametersample_mode
to select the method of sampling latent tensor.Argument
sample_mode="argmax"
is supposed to mean that $$latents = \mathop{argmax}\limits_{x}\ P_z(x)$$,where $P_z$ is the probility of encoded gaussian distribution, which means latents is the mean of the distribution in this case. Therefore the origin codes should be corrected to:
We fixed all the existing pipelines involving this function.
Before submitting
What's New
. Here are the documentation guidelinesWho can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@xxx