johndpope / Emote-hack

Emote Portrait Alive - using ai to reverse engineer code from white paper. (abandoned)
https://github.com/johndpope/VASA-1-hack
172 stars 9 forks source link

AudioLayers and AudioFeatureModel #19

Closed johndpope closed 8 months ago

johndpope commented 8 months ago

https://github.com/johndpope/Emote-hack/tree/main/junk/AudioAttention

Screenshot from 2024-03-17 23-00-59

Create a standalone script or notebook to test the AudioLayers and AudioFeatureModel. Initialize the AudioLayers and AudioFeatureModel with the desired configuration. Prepare a batch of audio features and latent codes as input. Pass the latent codes and audio features through the AudioLayers and verify the output shape and values. Pass the output of AudioLayers through the AudioFeatureModel and verify the output shape and values. Compare the output with the expected audio embeddings.

Screenshot from 2024-03-17 22-58-20

With controlnet + SD we can create a lot of images. If the wave signal can produce anything that resembles the images above - it should be trivial to plug into animation pipeline.

there's is a way we could do this sycle through training data create a corresponding frameset with head position for every frame. load a wav - train it on these - sound a frame 1 = head pose at frame 1.

https://github.com/HeliosZhao/Make-A-Protagonist/blob/main/makeaprotagonist/pipelines/pipeline_stable_unclip_controlavideo.py