Open jsjtczh opened 2 months ago
It's for extending the text2video models, window size is the context window size, if you have used AnimateDiff it's the same concept. You set the window size to what the model is trained for, then you set max frames to whatever you want (and what you can fit into your available VRAM).
It's for extending the text2video models, window size is the context window size, if you have used AnimateDiff it's the same concept. You set the window size to what the model is trained for, then you set max frames to whatever you want (and what you can fit into your available VRAM).
Thank you for your reply. I'm just curious about how the code is implemented. The DDIMSampler doesn't seem to have these two parameters. These parameters appear to be included in **kwargs. So, are they implemented in self.model.apply_model?
BTW, if frame_window_size=16 and frame_window_stride=4, it means two consecutive clips have 12 frames in common?
The code is actually in the attention code, this is work of @painebenjamin which he graciously shared in this PR: https://github.com/kijai/ComfyUI-DynamiCrafterWrapper/pull/20
The code is actually in the attention code, this is work of @painebenjamin which he graciously shared in this PR: #20
Thank you so much! I found the code. In the case of frame_window_size=16 and frame_window_stride=4, the overlap is 12.
I reviewed the code and couldn't find any implementation related to these two parameters. How do they work?