mit-han-lab / qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Apache License 2.0
391 stars 16 forks source link

activation quantization #13

Open hanhanpp opened 3 months ago

hanhanpp commented 3 months ago

I'm confused with the equation (12), what means the outer product of sw and sx? The activation is per-token quantization?

synxlin commented 3 months ago

Hi @hanhanpp , thank you for your interests in our work. For your questions, the activation is per-token dynamic quantization, while the weight is per-channel/per-group static quantization.