Closed yanbing-j closed 2 months ago
Hi @yanboliang @Chillee , could you please help review this PR? Thanks!
Hi @mikekgfb , could you please help review this PR? Thanks!
We really need to add at least some basic CI before merging those changes, as it can break things...
This PR is to add CPU support in mixtral-moe for int8 woq. To improve int8 woq performance, we use
torch.ops.aten._weight_int8pack_mm
as an workaround. And it will be removed when https://github.com/pytorch/pytorch/pull/120985 is in PyTorch stable release. Meanwhile, update int4 weight dimension, since https://github.com/pytorch/pytorch/pull/117475 has been merged into PyTorch.