johndpope / Emote-hack

Emote Portrait Alive - using ai to reverse engineer code from white paper. (abandoned)
https://github.com/johndpope/VASA-1-hack
173 stars 9 forks source link

Our approach incorporates the last n frames, termed ’motion frames’ - currently stuck on this. #27

Open johndpope opened 8 months ago

johndpope commented 8 months ago

Some methodologies employ a frame from the end of the preceding clip as the initial frame of the subsequent generation, aiming to maintain a seamless transition across concatenated segments. Inspired by that, our approach incorporates the last n frames, termed ’motion frames’ from the previously generated clip to enhance cross-clip consistency.

Screenshot 2024-03-24 at 6 34 06 pm

DRAFTED - https://github.com/johndpope/Emote-hack/blob/main/train_stage_1_0.py tomorrow i check the dimensions - it should actually be 64x64 once it's passed through the VAE.

https://github.com/johndpope/Emote-hack/blob/main/Net.py

UPDATE (is this related?) https://github.com/yrcong/flatten/blob/main/models/pipeline_flatten.py#L484

johndpope commented 8 months ago
Screenshot 2024-03-25 at 2 15 37 pm

https://arxiv.org/pdf/2301.03396.pdf

diffussed heads prior work to the rescue. we got to concatenate into the channel dimension.


reference_latent.ndim: 4
reference_latent.batch: 1
reference_latent.channels: 4
reference_latent.h: 64
reference_latent.w: 64
motion_frame.ndim: 3
motion_frame.batch: 3
motion_frame.channels: 512
motion_frame.h: 512
motion_frame_latent.ndim: 4
motion_frame_latent.b: 1
motion_frame_latent.c: 4
motion_frame_latent.h: 64
motion_frame_latent.w: 64
motion_frame.ndim: 3
motion_frame.batch: 3
motion_frame.channels: 512
motion_frame.h: 512
motion_frame_latent.ndim: 4
motion_frame_latent.b: 1
motion_frame_latent.c: 4
motion_frame_latent.h: 64
motion_frame_latent.w: 64
input_latent.b: 1
input_latent.c: 9
input_latent.h: 64
input_latent.w: 64

UPDATE - is it supposed to be black and white?

johndpope commented 5 months ago

i actually never fixed this. they did release training code - https://github.com/MStypulkowski/diffused-heads/tree/train

looking at it now - i see it's more of blur into Screenshot from 2024-06-11 13-10-08

this needs fixing.