Closed fpgaminer closed 4 years ago
Omg this is a bug, I'm pretty sure I meant to use -1e10
instead of 1e-10
. Nice find thank you!
I believe https://github.com/karpathy/minGPT/commit/8909e1b646d6fd5235ec33259fb22fdc2c91037c is the fix, ty.
Thanks for the quick fix! And thank you for this repo. I've been meaning to play with NLP and GPT, but have been a bit daunted by it. This repo made it easy to dive in and start tinkering.
I have a possible improvement to the
mingpt.utils.top_k_logits
function:I was using the
play_char
notebook to train against the IMDB dataset, but was getting really terrible samples out of it after training, unless I set temperature very low. Looking into the sampling code I noticed the odd choice of1e-10
intop_k_logits
. It seemed odd since most logits are negative, so using1e-10
may actually make many characters higher probability, not less/none. Replacing with negative infinity vastly improved sampling for me. A demonstration follows below. I'm happy to open a pull request, just let me know.Demo Code:
Output:
EDIT: Slight addendum: I just have to say how impressive the results of this model are with the fixed sampling, given that I only trained it for a few hours on a 2070.