microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
354 stars 31 forks source link

James/gptq improvements #163

Closed jameshensman closed 3 months ago

jameshensman commented 3 months ago

better args to run script, some minor memory improvements, improved numerical stability of cholesy.

Results with improved Cholesky are disappointingly small - no real change in ppl - less than I saw when switching to a newer cuda driver! None the less, should be faster especially for big models.