karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
MIT License
20.3k stars 2.53k forks source link

Caching for generation #95

Open murbard opened 1 year ago

murbard commented 1 year ago

Currently, generation is done by recomputing every activation after a token is added to the prompt. Normally, one would want to cache the intermediate activations to avoid recomputing them every time. It doesn't compose as well with using the forward function, but that's precisely why a clean and simple implementation should be a part of minGPT. It's very surprising that this is not afforded by pytorch's native TransformerEncoder module either.

karpathy commented 1 year ago

agree, a good todo item