Closed DexiangHong closed 4 months ago
u can see some differences around semi-transparent patterns if put them on a black bg like using some other example with larger semi-transparent area may lead to bigger differences
the released decoder is trained with data augmentations so that it works regardless of whether the unet's diffusion provides the expected distrubution or not. This can be useful when using special checkpoints like some anime models that struggles to yield latents with accurate offsets. but using offset is recommended and has lower loss in val/test set
i can put another decoder without augmentation if demanded so that the difference will be larger, but i prefer the current one, since that another decoder without augmentation has higher failure rate when changing different base models like anime and pony etc (since those special models do not always diffuse with matched offset distrubution)
I run demo_sdxl_vae_encoder_decode.py, and find there are no difference between the reconstruction results with offset and without offset .
Here is the original image: Here is the reconstruction with offset: Here is the reconstruction without offset:
Another Example: Here is the original image: Here is the reconstruction with offset: Here is the reconstruction without offset: