softmax1 / nanoGPT_softmax1

An experiment using nanoGPT vs nanoGPT (softmax1) to see how it affects perplexity score
0 stars 0 forks source link

Don't measure perplexity :-) #1

Open evanmiller opened 1 year ago

evanmiller commented 1 year ago

Just noticed in the README that you plan to measure perplexity... I don't expect much / any improvement in this metric – the hypothesis is that weight / activation kurtosis will be reduced, thus facilitating model compressibility rather than performance.

Perplexity of heavily quantized models may be interesting though.

evanmiller commented 1 year ago

See also this repo which is also using nanoGPT https://github.com/softmax1/quietGPT

mcapodici commented 1 year ago

Thanks, this is the feedback I was hoping for :-)

Happy to do kurtosis, and also perplexity at various levels of quantization.

I am happy to take a recommendation on how to measure the kurtosis, and how to interpret it. I did a quick search and found this: https://github.com/pytorch/pytorch/issues/101334, which leads to https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html. It would be good to know what % improvement is considered impressive.

Do I run this over all the weights, or just the ones that would be affected most by the softmax change like Q, K? I sort of prefer all the weights so I can't "cheat" by making some more compressible than others.