regarding setting scaling_factor=0.18215 instead of 0.13025 in stage 1 vae training

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Apache License 2.0

21.76k stars 2.1k forks source link

Dear open-sora,

I see vae_2d is init from "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers" which should have scaling_factor=0.13025, but in the code sd-v15's scaling_factor=0.18215 is instead hard coded every where for example: https://github.com/hpcaitech/Open-Sora/blob/9c4444207f18e6cf851e8cbac689f32bef762075/opensora/models/vae/vae.py#L35 and this value seems to be used for the stage 1 training of VAE_Temporal, i'm wondering if this is on purpose or a bug? This could cause input std for vae_temporal not as normalized to 1 as when applying 0.13025, but seems doesn't have other obvious impact, as there are also additional scale and shift applied to 3d latent in the end

hpcaitech / Open-Sora

regarding setting scaling_factor=0.18215 instead of 0.13025 in stage 1 vae training #493