karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
MIT License
20.31k stars 2.53k forks source link

Strange model behavior when taking the softmax in the wrong dimension #132

Open Cloud299 opened 9 months ago

Cloud299 commented 9 months ago

https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L64

I accidentally changed the softmax dimension to -2 instead of -1 and got incredibly low losses on both the training and validation set when using the tiny_shakespeare dataset. However, when generating from the model, I get very low-quality result. What is the explanation ?

My guess is that I'm somehow leaking information when taking the softmax in the wrong dimension, which may explain why the training loss is very low. However, I don't quite get why validation loss would also be low.

image

@karpathy Any idea why this is the case?