Closed johndpope closed 8 months ago
i think can throw away this class
in training_stage_1.py I will simply pass the reference image / motion frames through the vae (which is already frozen model) self.vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
this reduces the size of images from 512 x 512 -> 64x64.
and then the throw through referencenet https://github.com/johndpope/Emote-hack/blob/main/train_stage_1_0.py#L144
https://github.com/huggingface/diffusers/issues/3726
posterior = vae.encode(target).latent_dist
z = posterior.mode()
pred = vae.decode(z).sample
BEFORE
REDUCED TO