pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.35k stars 484 forks source link

int4/int4-gptq support in Mixtral 8x7B #129

Open yanbing-j opened 3 months ago

yanbing-j commented 3 months ago

Hi maintainers @yanboliang @Chillee ,

I saw Int8 Weight-Only Quantization is enabled in Mixtral 8x7B. And the next step should be supporting int4 and int4-gptq.

May I know the timeline of enabling int4/int4-gptq support in Mixtral 8x7B? Thanks!

yanboliang commented 3 months ago

@yanbing-j I think we need an custom int4 kernel to support it, we don't have very near plan to do that.

yanbing-j commented 3 months ago

@yanboliang Does int4 kernel in outside gpt-fast fit for Mixtral 8x7B? https://github.com/pytorch-labs/gpt-fast/blob/main/quantize.py#L398