Closed erliding closed 3 months ago
Thank you for your question. As we checked, we found that we made a mistake here. The reason is:
Thus, if you want to use our VAE, keep the value to 0.18215. Or if you want to train the VAE from scratch using our code, I suggest you change the scale.
Dear open-sora,
I see vae_2d is init from "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers" which should have scaling_factor=0.13025, but in the code sd-v15's scaling_factor=0.18215 is instead hard coded every where for example: https://github.com/hpcaitech/Open-Sora/blob/9c4444207f18e6cf851e8cbac689f32bef762075/opensora/models/vae/vae.py#L35 and this value seems to be used for the stage 1 training of
VAE_Temporal
, i'm wondering if this is on purpose or a bug? This could cause input std for vae_temporal not as normalized to 1 as when applying 0.13025, but seems doesn't have other obvious impact, as there are also additional scale and shift applied to 3d latent in the end