Closed stardusts-hj closed 1 month ago
The data range conventions are:
taesd.py
: images are in [0, 1], latents are gaussian-distributeddiffusers.AutoencoderTiny
images are in [-1, 1], latents are unit-normalized (you could apply the scale factor, but it's just 1.0)diffusers.AutoencoderKL
images are in [-1, 1], latents are not unit-normalized until you apply the scale factorThese ranges apply to both inputs and outputs. So your examples need to scale images before sending them to the SD VAE encoder. I think the correct pseudocode would be:
'''training with taesd.py'''
x # [0,1]
SD_latent = SD_vae.encoder(x.mul(2).sub(1)) * vae_factor
taesd_latent = taesd.encoder(x)
enc_loss = L2(SD_latent, taesd_latent)
taesd_output = taesd.decoder(SD_latent)
dec_loss = L2(x, taesd_output)
'''training with diffusers.AutoencoderTiny'''
x # [0,1]
SD_latent = SD_vae.encoder(x.mul(2).sub(1)) * vae_factor
taesd_latent = autoencodertiny_encoder(x.mul(2).sub(1))
enc_loss = L2(SD_latent, taesd_latent)
taesd_output = autoencodertiny_decoder(SD_latent) # auto convert to [-1,1]
# convert x to [-1,1]
dec_loss = L2(x.mul(2).sub(1), taesd_output)
I posted example TAESDXL training code here BTW, should be useful reference (specifically the DiffusersVAEWrapper
portion).
The data range conventions are:
taesd.py
: images are in [0, 1], latents are gaussian-distributeddiffusers.AutoencoderTiny
images are in [-1, 1], latents are unit-normalized (you could apply the scale factor, but it's just 1.0)diffusers.AutoencoderKL
images are in [-1, 1], latents are not unit-normalized until you apply the scale factorThese ranges apply to both inputs and outputs. So your examples need to scale images before sending them to the SD VAE encoder. I think the correct pseudocode would be:
'''training with taesd.py''' x # [0,1] SD_latent = SD_vae.encoder(x.mul(2).sub(1)) * vae_factor taesd_latent = taesd.encoder(x) enc_loss = L2(SD_latent, taesd_latent) taesd_output = taesd.decoder(SD_latent) dec_loss = L2(x, taesd_output) '''training with diffusers.AutoencoderTiny''' x # [0,1] SD_latent = SD_vae.encoder(x.mul(2).sub(1)) * vae_factor taesd_latent = autoencodertiny_encoder(x.mul(2).sub(1)) enc_loss = L2(SD_latent, taesd_latent) taesd_output = autoencodertiny_decoder(SD_latent) # auto convert to [-1,1] # convert x to [-1,1] dec_loss = L2(x.mul(2).sub(1), taesd_output)
I posted example TAESDXL training code here BTW, should be useful reference (specifically the
DiffusersVAEWrapper
portion).
Thank you so much for your reply! I'll follow your example.
Thanks for providing taesd. I'm trying to finetune taesd and I'm wondering what is the data range when you trained taesd. I see there is data transformation in diffusers TinyVAE, is it correct that you train tinyvae with the following data range
However, in the diffusers code, they convert the output of taesd decoder with scaling
Does it mean I have to convert the x when calculating the dec_loss if I want to use Tinyautoencoder in the diffusers like