Open evanmiller opened 1 year ago
See also this repo which is also using nanoGPT https://github.com/softmax1/quietGPT
Thanks, this is the feedback I was hoping for :-)
Happy to do kurtosis, and also perplexity at various levels of quantization.
I am happy to take a recommendation on how to measure the kurtosis, and how to interpret it. I did a quick search and found this: https://github.com/pytorch/pytorch/issues/101334, which leads to https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html. It would be good to know what % improvement is considered impressive.
Do I run this over all the weights, or just the ones that would be affected most by the softmax change like Q, K? I sort of prefer all the weights so I can't "cheat" by making some more compressible than others.
Just noticed in the README that you plan to measure perplexity... I don't expect much / any improvement in this metric – the hypothesis is that weight / activation kurtosis will be reduced, thus facilitating model compressibility rather than performance.
Perplexity of heavily quantized models may be interesting though.