karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
36.12k stars 5.63k forks source link

Is it possible to try using mse instead of cross entropy as the loss function? #397

Open SujayTalanki opened 9 months ago

SujayTalanki commented 9 months ago

Currently, the model uses F.cross_entropy as the loss function. I was wondering if we could use F.mse_loss as the loss function, and what the hyperparameters to that function would look like?

image

I was wondering how this block would change. Thank you!

shendiaomo commented 9 months ago

I don't think so. We have no meaningful labels in a language model for calculating mean squared error.

VatsaDev commented 8 months ago

@SujayTalanki try looking into xVal, they used mse