huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.03k stars 5.17k forks source link

StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support #8895

Closed DeTeam closed 2 weeks ago

DeTeam commented 1 month ago

I'm trying to deploy the smallest possible SD inpainting model. My production deployment only needs unet+vae+ipadapter weights with prompt and ip adapter image embeds pre-generated. Works well!

Now I wanted to try latent upscaler from diffusers and realized it currently doesn't support pre-generated embeds. Would probably be nice to keep its API aligned with the rest and add them.

Describe the solution you'd like.

Harmonizing inputs on the StableDiffusionLatentUpscalePipeline with other more frequently used pipelines would be nice.

yiyixuxu commented 1 month ago

would be very nice indeed! would you be willing to open a PR? if not we can ask the community to see if anyone else wants to help :)

DeTeam commented 1 month ago

@yiyixuxu sorry, I don't have capacity for a PR right now (unfamiliar with the codebase, assuming that testing would also take a while).

rootonchair commented 1 month ago

@yiyixuxu LatentUpscaler use two text embeds hidden_states and pooler_output for prompt and negative prompt https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py#L148-L149

should we change prompt_embeds from torch.FloatTensor to BaseModelOutputWithPooling?

yiyixuxu commented 1 month ago

@rootonchair we can:

  1. create a encode_prompt that's consistent with the method in other pipelines https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L275 (i.e. it should return prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds)
  2. and then refactor _encode_prompt (similar to https://github.com/huggingface/diffusers/blob/af400040f53148ba00042d7065747ccefa95903e/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L270)

    • we can use the encode_prompt we just created

      
      prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds = self.encode_prompt(...)
      if do_classifier_free_guidance:  
      prompt_embeds = ...
      pooled_prompt_embeds = ....
      else:
      ...
    • also deprecate it,
rootonchair commented 1 month ago

Thanks for your guidance @yiyixuxu. Will open a PR soon