Disabling fp8 compilation & go_fast for GPUs that don't support _scaled_mm - which is all GPUs w/compute capability < 8.9.
This way folks can still pull and run the model w/o issue. Defaulting to just running the model slow if a user attempts to run the fast model on the wrong GPU for ease of use.
Disabling fp8 compilation &
go_fast
for GPUs that don't support_scaled_mm
- which is all GPUs w/compute capability < 8.9.This way folks can still pull and run the model w/o issue. Defaulting to just running the model slow if a user attempts to run the fast model on the wrong GPU for ease of use.