Closed mikekgfb closed 9 months ago
cc @mingfeima
@mikekgfb cool ! We are just about to do something similar on the CPU device. We will add native support for int4 kernels on CPU:
So that int4 quant will run smoothly on CPU device, other dtypes bf16, f16 may just reply on MKL.
I’ll remove it until we find whether it buys us performance (I’ve seen above additional improvements for cpu sdpa land from the Intel team land since I did my experiments)
We’ll also want to check out what code is generated for Int8 data type the performance drop is beyond what appears rational (5-6 QPS vs 2-ish QPS with int8)
hi @mikekgfb May I know your plan of landing this PR? We want to propose a follow-up PR for Intel GPU based on it. Thanks!
Waiting on a review which is required to merge. Addressed @Chillee 's feedback. If he's not available who else can review? @cpuhrsch @jisaacso ?
Cool this is merged :) I will wrap up code and upstream CPU backend optimization kernels to pytorch soon.
Extend existing device variable to support code gen for other targets.
This PR adds a new command line argument to generate.py to select a device --device ['cpu', 'cuda'] # we have the option to add devices such as MPS in the future