Open yanbing-j opened 3 months ago
@yanbing-j I think we need an custom int4 kernel to support it, we don't have very near plan to do that.
@yanboliang Does int4 kernel in outside gpt-fast fit for Mixtral 8x7B? https://github.com/pytorch-labs/gpt-fast/blob/main/quantize.py#L398
Hi maintainers @yanboliang @Chillee ,
I saw Int8 Weight-Only Quantization is enabled in Mixtral 8x7B. And the next step should be supporting int4 and int4-gptq.
May I know the timeline of enabling int4/int4-gptq support in Mixtral 8x7B? Thanks!