Required GPU memory depends on the video length.

omerbt / TokenFlow

Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)

MIT License

1.58k stars 137 forks source link

I've managed to run run_tokenflow_pnp.py for a small excerpt of my video (5s) - and it looks really cool - but when I run it on the full one (5min) it crashes with CUDA OOM error even when I drop the batch size down to 1.

This scaling dependence on the video length probably caused by the extended attention seems like a major limitation of the method and is not highlighted neither in the discussion section nor somewhere else in the paper (as far as I can tell).

Is it possible to offload part of the attention computation to the CPU so that the number of frames is not a bottleneck?

omerbt / TokenFlow

Required GPU memory depends on the video length. #31