Open johndpope opened 8 months ago
https://arxiv.org/pdf/2301.03396.pdf
diffussed heads prior work to the rescue. we got to concatenate into the channel dimension.
reference_latent.ndim: 4
reference_latent.batch: 1
reference_latent.channels: 4
reference_latent.h: 64
reference_latent.w: 64
motion_frame.ndim: 3
motion_frame.batch: 3
motion_frame.channels: 512
motion_frame.h: 512
motion_frame_latent.ndim: 4
motion_frame_latent.b: 1
motion_frame_latent.c: 4
motion_frame_latent.h: 64
motion_frame_latent.w: 64
motion_frame.ndim: 3
motion_frame.batch: 3
motion_frame.channels: 512
motion_frame.h: 512
motion_frame_latent.ndim: 4
motion_frame_latent.b: 1
motion_frame_latent.c: 4
motion_frame_latent.h: 64
motion_frame_latent.w: 64
input_latent.b: 1
input_latent.c: 9
input_latent.h: 64
input_latent.w: 64
UPDATE - is it supposed to be black and white?
i actually never fixed this. they did release training code - https://github.com/MStypulkowski/diffused-heads/tree/train
looking at it now - i see it's more of blur into
this needs fixing.
Some methodologies employ a frame from the end of the preceding clip as the initial frame of the subsequent generation, aiming to maintain a seamless transition across concatenated segments. Inspired by that, our approach incorporates the last n frames, termed ’motion frames’ from the previously generated clip to enhance cross-clip consistency.
DRAFTED - https://github.com/johndpope/Emote-hack/blob/main/train_stage_1_0.py tomorrow i check the dimensions - it should actually be 64x64 once it's passed through the VAE.
https://github.com/johndpope/Emote-hack/blob/main/Net.py
UPDATE (is this related?) https://github.com/yrcong/flatten/blob/main/models/pipeline_flatten.py#L484