QuaRot: Add activation and KV cache quantization, GPTQ, Phi3, Groupsizes

KV cache quantization: reproduces Tables 10 and 11 to within 0.01 PPL -> i.e. working fully as expected.

Tested weight, activation and KV cache quantization i.e. end-to-end RTN: reproduces full 6- and 8-bit PPL results. For full 4-bit we get:

Llama-2 7B: 8.60 vs 8.37 in paper (i.e. 0.23 worse) Llama-2 13B: 6.34 vs 6.09 in paper (i.e. 0.25 worse)

Given A16W4 was 0.1 PPL worse on 7 and 13B models I think there must be a minor bug somewhere in symmetric RTN (KV cache quantization uses asymmetric RTN). I have some ideas so will investigate.

microsoft / TransformerCompression

QuaRot: Add activation and KV cache quantization, GPTQ, Phi3, Groupsizes #149