MoE example - Githubissues

pytorch / ao

PyTorch native quantization and sparsity for training and inference

BSD 3-Clause "New" or "Revised" License

895 stars 105 forks source link

Open msaroufim opened 1 month ago

msaroufim commented 1 month ago

My sense is we should already be able to support this with

from torchao.quantization.quant_api import quantize_, int4_weight_only
quantize_(m, int8_weight_only())

Would just need a good example to showcase this

Also cc @jcaip and @cpuhrsch who've thought a lot more about MoE than me

felipemello1 commented 1 month ago