Closed Xiang-cd closed 8 months ago
Yeah, the SD-VAE and TAESD latent spaces are compatible. You just need to make sure that the latents / images are scaled appropriately. Here's some example code using diffusers:
!wget -q -nc "https://upload.wikimedia.org/wikipedia/commons/9/9c/Crepe_with_LaFrance_and_strawberries_and_fresh_cream_in_it.jpg" -O sample_image.jpg
from diffusers import AutoencoderTiny, AutoencoderKL
import torchvision.transforms.functional as TF
import torch as th
from PIL import Image
sdvae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-ema").half().eval().requires_grad_(False).cuda()
taesd = AutoencoderTiny.from_pretrained("madebyollin/taesd").half().eval().requires_grad_(False).cuda()
im = TF.center_crop(TF.resize(Image.open("sample_image.jpg").convert("RGB"), 512), 512)
im_cuda = TF.to_tensor(im)[None].mul(2).sub(1).cuda().half()
def show(*args):
args = [a[0] if a.shape[0] == 1 else a for a in args]
display(TF.to_pil_image(th.cat(args, -1).mul(0.5).add(0.5).clamp(0, 1)))
latents_sdvae = sdvae.encode(im_cuda).latent_dist.sample().mul(sdvae.config.scaling_factor)
latents_taesd = taesd.encode(im_cuda).latents
print("Encoded Latents (SDVAE, TAESD)")
show(latents_sdvae, latents_taesd)
print("Decoded SDVAE Latents (SDVAE->SDVAE, SDVAE->TAESD)")
dec_sdvae = sdvae.decode(latents_sdvae.div(sdvae.config.scaling_factor)).sample
dec_taesd = taesd.decode(latents_sdvae).sample
show(dec_sdvae, dec_taesd)
print("Decoded TAESD Latents (TAESD->SDVAE, TAESD->TAESD)")
dec_sdvae = sdvae.decode(latents_taesd.div(sdvae.config.scaling_factor)).sample
dec_taesd = taesd.decode(latents_taesd).sample
show(dec_sdvae, dec_taesd)
thanks, i think it was bugs of my implement or version of sd-vae differs
I found taesd vae decoder could not decode latents with sd vae encoded, sd vae decoder could not decode the latents encoded with taesd encoder. why?