wilson1yan / VideoGPT

MIT License
968 stars 119 forks source link

the previous dependency is only one time step during inference? #24

Closed PanXiebit closed 3 years ago

PanXiebit commented 3 years ago

https://github.com/wilson1yan/VideoGPT/blob/d157da51b3b9766648eb1e54a1008ff965e26b65/videogpt/gpt.py#L97-L107

hi, @wilson1yan! In these lines, it seems that the iterative generation of the next code only depends on the one previous time step? The shape of embeddings_slice is always [bs, 1, 1, 1, embed_dim].

wilson1yan commented 3 years ago

The generation of the next token does depend on all previous tokens. The sampling code uses caching to make sampling more efficient / faster. Specifically, as each token is generated, the intermediate hidden units are stored and reused at later timesteps so they don't have to be re-calculated. The relevant caching code can be found here

PanXiebit commented 3 years ago

thank you @wilson1yan! you are right.