Explore expanding dynamic quantization kernels (broaden a8w4dq support)

pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

BSD 3-Clause "New" or "Revised" License

3.3k stars 210 forks source link

Explore expanding dynamic quantization kernels (broaden a8w4dq support) #638

Open mikekgfb opened 5 months ago

mikekgfb commented 5 months ago

Add support for 1 - asymmetric a8w4dq, basically require to subtract zero from each value before multiplying, so should add a single multiply. This will help accelerate and better handle GGUF files on executorch and read Q4_0 for export to mobile. 2 - a8w4dq on desktop 3 - the asymmetric version of (1) on desktop

digantdesai commented 5 months ago

Is this a new non-xnnpack route?