Closed ChenMnZ closed 1 month ago
Could you provide your BitBLAS version? In the 0.0.0.dev4 release, we inadvertently disabled some optimizations. You might want to check the matmul.fast_decoding
attribute; it's possible that it was set to false, which could cause performance issues.
My BitBLAS version is 0.0.1.dev5
I pass the fast_decoding=True to MatMul, and the performance is right now.
Matmul 1-16384-16384-float16-int4-float16-float16-nt-False-128-False-False-None-int8-True 0.083 ms
Thanks!
it's weird because the fast_decoding flag should be set to True by default, thanks for your report!
@ChenMnZ Thanks for your report again! There indeed exists a typo to disable the fast_decoding by default. We just release the 0.0.0.dev6, we don't need to specify the fast decoding any more.
Hello,
I use https://github.com/microsoft/BitBLAS/blob/main/benchmark/operators/benchmark_bitblas_matmul.py to benchmark the speed of operations and test on A100-80GB gpu. The obtained results is:
It seems that W4A16 is 2x faster than W16A16, less than the reported 4x.
I wonder know if I miss something. Thank you!