johndpope / VASA-1-hack

Using Claude Opus to reverse engineer code from VASA white paper - WIP - (this is for La Raza 🎷)
https://www.microsoft.com/en-us/research/project/vasa-1/
MIT License
176 stars 22 forks source link

The progress status #15

Open huyduong7101 opened 3 weeks ago

huyduong7101 commented 3 weeks ago

Hi @johndpope , have you tried to train it yet? If you did, could you share a little about losses and metrics during training? I am building a model that mixes VASA-1 and AniPortrait. I adopted only the diffusion loss (MSELoss). During training, the loss is fluctuated between 0.1 and 0.3, and the final results (video) is irrelevant between frames.

johndpope commented 3 weeks ago

I create a draft PR which is based off work from megapaportraits. Emportraits code will drop next month - maybe not training code. I’m not sure exactly what the rendering engine - is it using stable diffusion pipeline? Ddpm? I have the diffusion transformer in this code - and 2 training stages. I’m very interested to know your thoughts on that branch- my main is based on key points which MS didn’t use.