latents_pose = poseguider(pose_condition)
# latents_pose = rearrange(latents_pose, "(b f) c h w -> b c f h w", f=video_length)
if do_classifier_free_guidance: latents_pose = latents_pose.repeat(2,1,1,1) # b c h w
here instead of repeating, would passing zeros through the poseguider and then catting be more appropriate?
here instead of repeating, would passing zeros through the poseguider and then catting be more appropriate?