microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
190 stars 21 forks source link

WHLs for cuda 11.7, 11.8, and 12.0 for future Releases #62

Open Qubitium opened 2 days ago

Qubitium commented 2 days ago

Currently the bitblas whl support is too limited >= 12.1. I understand that building so many whl/python/torch combos is a headache but I think it may be worth it.

Include support for all Cuda supported by Torch >= 2.0.0 which need to add 11.7, 11.8, and 12.0 to the WHL builds.

Reasons:

  1. Lots of gpu-poor academics are locked to institution provided envs where drivers are often locked to cuda 11.7, 11.8 version.
  2. Compiling bitblas has large os lib dependency and a simple git clone + build is not possible even on ubuntu without adding ubuntu pkgs. But if env is not ubuntu the problem becomes quite a problem for users that has no clue about builds.
  3. Allow 3rd party to fully embed Bitblas without raising Torch/Cuda requirements. GPTQModel, for example, has integrated bitblas right now as a non-optional integration but need to add cuda checks for pkg compat and redirect users to src compile during runtime.
### Tasks
- [ ] Append cuda 11.7, 11.8, and 12.0 for future Releases
LeiWang1999 commented 2 days ago

Hi @Qubitium , thank you for your attention. Indeed, bitblas is not officially released yet. We are currently working on performance-related optimizations, And there are still many items on our roadmap, such as CI/CD integration and support for VLLM. We are committed to completing these tasks and releasing more WHL packages in our official release.

We expect to complete these tasks in approximately two weeks.