Referencenet input/output shape

sarperkilic commented 8 months ago

Hi,

Thanks for your work.

What is the input and output shapes of referencenet model?

I assumed reference_latents and motion_latents should be passed through the referencenet.

Thanks

johndpope commented 8 months ago

the paper talks about backbone / reference aligning to unet / sd that they share same architecture - so I'd bank on it being like this

https://github.com/kohya-ss/sd-scripts/blob/main/train_controlnet.py#L134

I push this https://github.com/johndpope/Emote-hack/commit/0a05888789694c89e92048acfb583e938d111dd4

I'm still getting my head around the denoising step - there's a lot of complex code out there. I attempt to use one class -> EmoAnimationPipeline - which has training code at bottom.

to do the animating - I'm leaning into more VideoNet class from JimmyLi https://github.com/jimmyl02/animate/blob/main/animate-anyone/models/videonet.py

this will dump the unet modules in terminal https://github.com/johndpope/Emote-hack/blob/main/videonet_animatediff.py

johndpope commented 8 months ago

I updated the code in trainer (train_stage_1_0.py) - I ditched the FramesEncoderVAE - it wasn't necessary as the picture show the frozen symbol - so i'm certain it's just the
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")

(shrinking the images from 512x 512 -> 64x64)

python train_stage_1_0.py but now i hit this problem RuntimeError: mat1 and mat2 shapes cannot be multiplied (8192x320 and 768x320) https://github.com/johndpope/Emote-hack/commit/00206f07c7cd7a1c7eed4f360e6e26720c754641

it's not stated in the Emo paper - but i suspect that they are using multiple channels ?? as input into unet. so from 3 or 4 -> 9. as per this other Alibaba paper. https://github.com/johndpope/Emote-hack/issues/26

Screenshot from 2024-03-24 13-52-26

Interested to hear anyone's thoughts.

If someone knows of a codebase that's adding the frame / concatenating previous frames correctly - I'm all ears.

UPDATE one of these must have it. https://github.com/showlab/Awesome-Video-Diffusion


git clone https://github.com/Stability-AI/generative-models
git clone https://github.com/showlab/Show-1
git clone https://github.com/hotshotco/Hotshot-XL
git clone https://huggingface.co/cerspense/zeroscope_v2_576w
git clone https://huggingface.co/cerspense/zeroscope_v2_XL
git clone https://modelscope.cn/models/damo/Image-to-Video/summary
git clone https://modelscope.cn/models/damo/Video-to-Video/summary
git clone https://github.com/camenduru/text-to-video-synthesis-colab
git clone https://github.com/VideoCrafter/VideoCrafter
git clone https://modelscope.cn/models/damo/text-to-video-synthesis/summary
git clone https://huggingface.co/docs/diffusers/main/en/api/pipelines/text_to_video#texttovideo-synthesis
git clone https://github.com/showlab/T2VScore
git clone https://github.com/Vchitect/VBench?tab=readme-ov-file
git clone https://github.com/llyx97/FETV
git clone https://github.com/EvalCrafter/EvalCrafter
git clone https://github.com/openGVLab/InternVideo/tree/main/Data/InternVid
git clone https://github.com/VideoCrafter/Animate-A-Story
git clone https://github.com/guoyww/animatediff/
git clone https://github.com/Wangt-CN/DisCo
git clone https://github.com/araachie/yoda
git clone https://github.com/damo-vilab/videocomposer
git clone https://github.com/video-adapter/video-adapter/
git clone https://github.com/VideoCrafter/Make-Your-Video
git clone https://github.com/G-U-N/Gen-L-Video
git clone https://github.com/Weifeng-Chen/control-a-video
git clone https://github.com/thu-ml/controlvideo
git clone https://github.com/Make-A-Protagonist/Make-A-Protagonist
git clone https://github.com/kuai-lab/soundini-official
git clone https://github.com/baaivision/vid2vid-zero
git clone https://github.com/videodreamer23/videodreamer23.github.io
git clone https://github.com/Vchitect/SEINE
git clone https://github.com/arthur-qiu/LongerCrafter
git clone https://github.com/Doubiiu/DynamiCrafter
git clone https://github.com/RQ-Wu/LAMP
git clone https://github.com/TencentARC/MotionCtrl
git clone https://github.com/damo-vilab/i2vgen-xl
git clone https://github.com/scofield7419/Dysen
git clone https://github.com/ChenHsing/SimDA
git clone https://github.com/damo-vilab/i2vgen-xl
git clone https://github.com/damo-vilab/i2vgen-xl
git clone https://github.com/damo-vilab/i2vgen-xl
git clone https://github.com/microsoft/Peekaboo
git clone https://github.com/TianxingWu/FreeInit
git clone https://github.com/aim-uofa/GenDeF
git clone https://github.com/voletiv/mcvd-pytorch
git clone https://github.com/SooLab/Free-Bloom
git clone https://github.com/fanfanda/M3DDM
git clone https://github.com/anonymous0x233/ReuseAndDiffuse
git clone https://github.com/showlab/Tune-A-Video
git clone https://github.com/researchmm/MM-Diffusion
git clone https://github.com/MAGVIT/magvit
git clone https://github.com/MKFMIKU/VIDM
git clone https://github.com/araachie/river
git clone https://github.com/YingqingHe/LVDM
git clone https://github.com/yanivnik/sinfusion-code
git clone https://github.com/buggyyang/RVD
git clone https://github.com/anonymous0769/DreamVideo
git clone https://github.com/XavierCHEN34/LivePhoto
git clone https://github.com/HumanAIGC/AnimateAnyone
git clone https://github.com/SPengLiang/SmoothVideo
git clone https://github.com/videoassembler/videoassembler
git clone https://github.com/aniki-ly/FlowZero
git clone https://github.com/yudianzheng/SketchVideo
git clone https://github.com/Stability-AI/generative-models
git clone https://github.com/VChitect/VideoBooth
git clone https://github.com/GongyeLiu/StyleCrafter
git clone https://github.com/wangyanhui666.github.io/MicroCinema.github.io/
git clone https://github.com/WarranWeng/ART.V
git clone https://github.com/guoyww/animatediff/
git clone https://github.com/Finspire13/DiffAct
git clone https://github.com/showlab/MotionDirector
git clone https://github.com/Vchitect/LaVie
git clone https://github.com/guyyariv/TempoTokens
git clone https://github.com/ai-forever/KandinskyVideo
git clone https://github.com/alibaba/animate-anything
git clone https://github.com/jiaxilv/GPT4Motion
git clone https://github.com/Boese0601/MagicDance
git clone https://github.com/jingyunliang.github.io/MoVideo/
git clone https://github.com/makepixelsdance.github.io/
git clone https://github.com/JeffWang987/WorldDreamer
git clone https://github.com/kyfafyd.wang/projects/customvideo/
git clone https://github.com/univg-baidu.github.io/
git clone https://github.com/AILab-CVC/VideoCrafter
git clone https://github.com/akaneqwq.github.io/360DVD/
git clone https://github.com/zhang-zx/AVID
git clone https://github.com/kuai-lab/mtvg-page
git clone https://github.com/dreamoving/dreamoving-project
git clone https://github.com/zyxElsa/MotionCrafter
git clone https://github.com/joaanna.github.io/customizing_motion/
git clone https://github.com/vvictoryuki/AnimateZero
git clone https://github.com/damo-vilab/i2vgen-xl
git clone https://github.com/sczhou/Upscale-A-Video
git clone https://github.com/cerspense/zeroscope_v2_576w
git clone https://github.com/damo-vilab/i2vgen-xl
git clone https://github.com/Vchitect/SEINE
git clone https://github.com/Jeff-LiangF/FlowVid
git clone https://github.com/fairy-video2video.github.io/
git clone https://github.com/lixirui142/VidToMe
git clone https://github.com/STEM-Inv/stem-inv
git clone https://github.com/STEM-Inv/stem-inv
git clone https://github.com/neuedit.github.io/
git clone https://github.com/rehg-lab/RAVE
git clone https://github.com/ldynx.github.io/SAVE/
git clone https://github.com/mayuelala/MagicStick
git clone https://github.com/showlab/VideoSwap
git clone https://github.com/RickySkywalker/DragVideo-Official
git clone https://github.com/drag-a-video.github.io/
git clone https://github.com/bivdiff.github.io/
git clone https://github.com/HyeonHo99/Video-Motion-Customization
git clone https://github.com/yrcong/flatten
git clone https://github.com/Francis-Rings/MotionEditor
git clone https://github.com/ChenyangQiQi/FateZero
git clone https://github.com/Ground-A-Video/Ground-A-Video
git clone https://github.com/BarqueroGerman/BeLFusion
git clone https://github.com/HumanAIGC/AnimateAnyone
git clone https://github.com/omerbt/TokenFlow
git clone https://github.com/gabriel-huang.github.io/inve/
git clone https://github.com/williamyang1991/Rerender_A_Video
git clone https://github.com/Weifeng-Chen/control-a-video
git clone https://github.com/thu-ml/controlvideo
git clone https://github.com/jqin4749/MindVideo
git clone https://github.com/microsoft/i-Code/tree/main/i-Code-V3
git clone https://github.com/YBYBZhang/ControlVideo
git clone https://github.com/qiuyu96/CoDeF
git clone https://github.com/showlab/DragAnything
git clone https://github.com/DanBigioi/DiffusionVideoEditing
git clone https://github.com/man805/Diffusion-Video-Autoencoders
git clone https://github.com/plai-group/flexible-video-diffusion-modeling
git clone https://github.com/mrzzy2021/styledmotionsynthesis
git clone https://github.com/ChenFengYe/motion-latent-diffusion
git clone https://github.com/kakaobrain/flame
git clone https://github.com/mingyuan-zhang/MotionDiffuse
git clone https://github.com/GuyTevet/motion-diffusion-model
git clone https://github.com/gutianpei/MID
git clone https://github.com/Tobi-r9/RaMViD
git clone https://github.com/voletiv/mcvd-pytorch
git clone https://github.com/buggyyang/RVD
git clone https://github.com/video-diffusion.github.io/
git clone https://github.com/MCG-NJU/PDPP
git clone https://github.com/sauradip/DiffusionTAD
git clone https://github.com/Finspire13/DiffAct
git clone https://github.com/jpthu17/DiffusionRet
git clone https://github.com/lzp870/RSFD
git clone https://github.com/google-research/pix2seq
git clone https://github.com/nianticlabs/diffusionerf
git clone https://github.com/Lakonik/SSDNeRF
git clone https://github.com/ayaanzhaque/instruct-nerf2nerf
git clone https://github.com/mingyuan-zhang/ReMoDiffuse
git clone https://github.com/priorMDM/priorMDM
git clone https://github.com/LinghaoChan/HumanMAC
git clone https://github.com/cotton-ahn/diffusion-motion-prediction
git clone https://github.com/SinMDM/SinMDM
git clone https://github.com/facebookresearch/AGRoL
git clone https://github.com/tr3e/InterGen
git clone https://github.com/sukun1045/video-physics-sound-diffusion
git clone https://github.com/seervideodiffusion.github.io/
git clone https://github.com/Picsart-AI-Research/Text2Video-Zero
git clone https://github.com/nihaomiao/CVPR23_LFDM
git clone https://github.com/sihyun-yu/PVDM
git clone https://github.com/yumingj/Text2Performer

johndpope / Emote-hack

Referencenet input/output shape #24