Closed kadirnar closed 3 months ago
Yeah I already pushed a fixed yesterday: https://github.com/mobiusml/hqq/commit/d09b4e6f93e9c387b0caee86c5df869baaa8fb12
Just use the master branch, or use the bitblas backend instead (with torch.float16
instead of torch.bfloat16
, don't forget to install it first pip install bitblas
)
I want to use hqq optimization and torch.compile using llama3.1 models. What code should I run to make it run fastest?
examples/backends/torchao_int4_demo.py
Error Message: