wilson1yan / VideoGPT

MIT License
962 stars 115 forks source link

Caching during sampling without effect in cross-attention? #32

Closed Natithan closed 2 years ago

Natithan commented 2 years ago

Hi,

During sampling of videogpt, when cross-attention is calculated with conditioning frames as key/value and generated frames as query, the key/value codes are cached, as they are the same for each decode step.

However, it seems to me they are still being calculated anew for each decode step regardless of whether they were cached, so that defeats the purpose of saving on compute.

So is this a mistake? Or does it still save on e.g., memory in some way?

wilson1yan commented 2 years ago

Yeah, that seems to be a mistake in the code. Nice catch!