shivammehta25 / Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
https://shivammehta25.github.io/Matcha-TTS/
MIT License
751 stars 98 forks source link

Bump diffusers from 0.21.3 to 0.22.1 #27

Closed dependabot[bot] closed 1 year ago

dependabot[bot] commented 1 year ago

Bumps diffusers from 0.21.3 to 0.22.1.

Release notes

Sourced from diffusers's releases.

Patch Release: Fix community vs. hub pipelines revision

  • [Custom Pipelines] Make sure that community pipelines can use repo revision by @​patrickvonplaten

v0.22.0: LCM, PixArt-Alpha, AnimateDiff, PEFT integration for LoRA, and more

Latent Consistency Models (LCM)

Untitled

LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", torch_dtype=torch.float32)

To save GPU memory, torch.float16 can be used, but it may compromise image quality.

pipe.to(torch_device="cuda", torch_dtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

Can be set to 150 steps. LCM support fast inference even <= 4 steps. Recommend: 18 steps.

num_inference_steps = 4

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0).images

Refer to the documentation to learn more.

LCM comes with both text-to-image and image-to-image pipelines and they were contributed by @​luosiallen, @​nagolinc, and @​dg845.

PixArt-Alpha

PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.

It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.

Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the PixArtAlphaPipeline:

from diffusers import PixArtAlphaPipeline
import torch 

pipeline_id = "PixArt-alpha/PixArt-XL-2-1024-MS" pipeline = PixArtAlphaPipeline.from_pretrained(pipeline_id, torch_dtype=torch.float16) pipeline.enable_model_cpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert." </tr></table>

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 1 year ago

Superseded by #28.