mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
https://arxiv.org/abs/2211.10438
MIT License
1.18k stars 134 forks source link

Bloom code #16

Open Toan-Do opened 1 year ago

Toan-Do commented 1 year ago

Thank you for your great work. I am very interested in Bloom int8 models. Could you please share the code and checkpoints for Int8 Bloom models ?

Toan-Do commented 1 year ago

Hi @Guangxuan-Xiao , for Gelu in Bloom model. Do you implement W8A8B8O8LinearGelu kenel for it or implement custom Gelu activation function to deal with 8byte datatype output of W8A8B8O8Linear?

ImanHosseini commented 1 year ago

Not sure about how they did it, but this change: https://github.com/Guangxuan-Xiao/torch-int/pull/1/commits/2163a169748edff67586c2bf0158f4c7f0718fc6 includes an implementation for Gelu unit.