microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
190 stars 21 forks source link

BitBLAS

BitBLAS is a library to support mixed-precision BLAS operations on GPUs, for example, the $W{wdtype}A{adtype}$ mixed-precision matrix multiplication where $C{cdtype}[M, N] = A{adtype}[M, K] \times W{wdtype}[N, K]$. BitBLAS aims to support efficient mixed-precision DNN model deployment, especially the $W{wdtype}A{adtype}$ quantization in large language models (LLMs), for example, the $W{UINT4}A{FP16}$ in GPTQ, the $W{INT2}A{FP16}$ in BitDistiller, the $W{INT2}A_{INT8}$ in BitNet-b1.58. BitBLAS is based on techniques from our accepted submission "Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation" at OSDI'24.

Some of the key features of BitBLAS include:

Latest News

Integration Example of FasterTransformer with BitBLAS

FasterTransformer Integration

Benchmark Summary

BitBLAS achieves exceptional performance across a variety of computational patterns. Below are selected results showcasing its capabilities:

For more detailed information on benchmark sets with other formats (NF4/FP4) and other devices (RTX 3090), please refer to the benchmark.

Support Matrix

A_dtype W_dtype Accum_dtype Out_dtype BitBLAS
Support
Tested
Platform
FP16 FP16 FP16 FP16 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
FP16 FP4_E2M1 FP16 FP16 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
FP16 FP8_E4M3 FP16 FP16 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
FP16 INT8 FP16 FP16 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
FP16 UINT4/INT4 FP16 FP16 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
FP16 UINT2/INT2 FP16 FP16 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
FP16 UINT1 FP16 FP16 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
FP16 NF4 FP16 FP16 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
INT8 INT8 INT32 FP32/INT32/FP16/INT8 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
INT8 UINT4/INT4 INT32 FP32/INT32/FP16/INT8 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
INT8 UINT2/INT2 INT32 FP32/INT32/FP16/INT8 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
INT8 UINT1 INT32 FP32/INT32/FP16/INT8 V100(SM_70)/A100(SM_80)/A6000(SM_86)/RTX 4090(SM_89)
FP8_E4M3 FP8_E4M3 FP32 FP32/FP16 RTX 4090(SM_89)
FP8_E5M2 FP8_E5M2 FP32 FP32/FP16 RTX 4090(SM_89)

We are continuously expanding the support matrix. If you have any specific requirements, please feel free to open an issue or PR.

Getting Started

Reference

Please cite BitBLAS/Ladder in your publications if it helps your research:

@inproceedings {ladder-osdi24,
author = {Lei Wang and Lingxiao Ma and Shijie Cao and Quanlu Zhang and Jilong Xue and Yining Shi and Ningxin Zheng and Ziming Miao and Fan Yang and Ting Cao and Yuqing Yang and Mao Yang},
title = {Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation},
booktitle = {18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)},
year = {2024},
url = {https://www.usenix.org/conference/osdi24/presentation/wang-lei},
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.