Closed ryusaeba closed 3 weeks ago
As flash attention shows, I guess this library cannot support flash attention usage, since mx_mapping.inject_pyt_ops mainly affect the pytorch ops or module. Please correct me if I am wrong.
You're correct. The library won't quantize the flash-attention ops.
As flash attention shows, I guess this library cannot support flash attention usage, since mx_mapping.inject_pyt_ops mainly affect the pytorch ops or module. Please correct me if I am wrong.