[Suggestion] Add a note about the training of Bengio et al. MLP

karpathy / makemore

An autoregressive character-level language model for making more things

MIT License

2.47k stars 652 forks source link

[Suggestion] Add a note about the training of Bengio et al. MLP #4

Open OmriKaduri opened 1 year ago

OmriKaduri commented 1 year ago

Hi @karpathy, thanks for that great repo!

Maybe it would be better to note in your code that while you're training by minimizing the CE loss, Bengio actually maximized the log-likelihood. I know that it is equivalent in this case (one-hot vectors as ground-truth), but that's not the case in general, so maybe better to note. Thanks!