Closed AmitMY closed 5 months ago
The command used to generate the video outputs there is
pose_to_video --type=controlnet --model=sign/sd-controlnet-mediapipe --pose=assets/testing-reduced.pose --video=assets/outputs/controlnet-animatediff.mp4 --processors animatediff
It is based on holistic, yes
If I may ask, which scripts of the pose-to-video for the diffusion models do you use? Have they the holistic bones as input? Or how do know what the expected output of the diffusion model should be?
Originally posted by @florianbaer in https://github.com/sign-language-processing/spoken-to-signed-translation/issues/26#issuecomment-1999039371