for those who (like me) wanted to apply this exciting technique for longer videos:
i've integrated this method into my SD repo https://github.com/eps696/SDfu and added there batches for pivots with offloading them onto CPU. this allowed to process e.g. 300 frames in 960x540 res on 3090 (24gb).
as i renamed some variables to my convenience, my code is not directly copypastable into this repo, yet i hope it's readable enough to apply here. the solution is also pretty clumsy, as i had very little idea about that attention stuff and just tried to debug OOMs..
for those who (like me) wanted to apply this exciting technique for longer videos: i've integrated this method into my SD repo https://github.com/eps696/SDfu and added there batches for pivots with offloading them onto CPU. this allowed to process e.g. 300 frames in 960x540 res on 3090 (24gb). as i renamed some variables to my convenience, my code is not directly copypastable into this repo, yet i hope it's readable enough to apply here. the solution is also pretty clumsy, as i had very little idea about that attention stuff and just tried to debug OOMs..