Closed joey00072 closed 4 months ago
Hi @joey00072 Yeah. We currently provide pre-built .whl files only for CUDA 12.0.
The requirement for flash atten originates from BitNet, not this repository. Maybe we should add a note about this in the README under the BitNet folder.
It's surprising that eval_ppl failed. Our tests on the BitNet-BitBLAS integration were conducted only using eval_correctness.py. I will investigate this issue lately.
@joey00072 the self.sw was assigned in https://github.com/microsoft/BitBLAS/blob/d536ddea210d5c0a97dfb55b4630d944421d13e2/integration/BitNet/utils_quant.py#L89
as the sw shouldn't be calculate at the online inference stage, we use a post_process to intialize the results.
Take a look at the example usage: https://github.com/microsoft/BitBLAS/blob/main/integration/BitNet/eval_correctness.py#L47-L48
Would you mind check out with the post_process again?
Yes, I did this also, I was stilling getting nan
you should do post_process before bitblas_matmul I guess.
hi @joey00072 did this resolve your issue?
Nope, Currently jumping through dependency hell.
when i install it from pip
Installing collected packages: bitblas
Successfully installed bitblas-0.0.1.dev2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@nabki9esff:/notebooks/BitBLAS/integration/BitNet# python eval_correctness.py
Traceback (most recent call last):
File "/notebooks/BitBLAS/integration/BitNet/eval_correctness.py", line 7, in <module>
from modeling_bitnet import BitnetForCausalLM
File "/notebooks/BitBLAS/integration/BitNet/modeling_bitnet.py", line 52, in <module>
from utils_quant import BitLinear
File "/notebooks/BitBLAS/integration/BitNet/utils_quant.py", line 9, in <module>
import bitblas
File "/usr/local/lib/python3.9/dist-packages/bitblas/__init__.py", line 19, in <module>
from . import gpu # noqa: F401
File "/usr/local/lib/python3.9/dist-packages/bitblas/gpu/__init__.py", line 7, in <module>
from .fallback import Fallback # noqa: F401
File "/usr/local/lib/python3.9/dist-packages/bitblas/gpu/fallback.py", line 25, in <module>
from tvm import tir
File "/usr/local/lib/python3.9/dist-packages/bitblas/3rdparty/tvm/python/tvm/__init__.py", line 26, in <module>
from ._ffi.base import TVMError, __version__, _RUNTIME_ONLY
File "/usr/local/lib/python3.9/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/__init__.py", line 28, in <module>
from .base import register_error
File "/usr/local/lib/python3.9/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/base.py", line 78, in <module>
_LIB, _LIB_NAME = _load_lib()
File "/usr/local/lib/python3.9/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/base.py", line 64, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.9/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libnvrtc.so.12: cannot open shared object file: No such file or directory
root@nabki9esff:/notebooks/BitBLAS/integration/BitNet#
When I install it from source
_LIB, _LIB_NAME = _load_lib()
File "/notebooks/BitBLAS/python/bitblas/../../3rdparty/tvm/python/tvm/_ffi/base.py", line 58, in _load_lib
lib_path = libinfo.find_lib_path()
File "/notebooks/BitBLAS/python/bitblas/../../3rdparty/tvm/python/tvm/_ffi/libinfo.py", line 166, in find_lib_path
raise RuntimeError(message)
RuntimeError: Cannot find libraries: ['libtvm.so', 'libtvm_runtime.so', '3rdparty/cutlass_fpA_intB_gemm/cutlass_kernels/libfpA_intB_gemm.so', '3rdparty/libflash_attn/src/libflash_attn.so']
List of candidates:
would you mind provide the install logs when you type pip install .
in the root directory?
hey even eval_corretess.py giving nans
...,
[[ 2.4748e-04, 4.9496e-04, -7.1383e-04, ..., 7.1955e-04,
-5.0783e-04, -9.5367e-04]],
[[-3.7241e-04, -1.9264e-04, -2.5892e-04, ..., -3.6144e-04,
6.3705e-04, 4.5371e-04]],
[[-1.0176e-03, 3.1352e-04, 3.9411e-04, ..., -6.5327e-05,
-1.7977e-03, 2.9659e-04]]]], device='cuda:0', dtype=torch.float16)), (tensor([[[[ 4.8399e-04, -3.2020e-04, 4.4203e-04, ..., 3.4881e-04,
-1.1134e-04, -1.5485e-04]],
[[ 3.0470e-04, -4.4990e-04, -2.3532e-04, ..., 1.3952e-03,
1.4520e-04, 4.8208e-04]],
[[ 8.8215e-05, 1.9014e-05, 4.3082e-04, ..., -1.9407e-04,
1.3077e-04, -3.5095e-04]],
...,
[[ 1.3900e-04, 4.2200e-04, -3.9721e-04, ..., 4.0531e-04,
-2.6393e-04, -5.3501e-04]],
[[-2.2686e-04, -1.1355e-04, -1.3375e-04, ..., -2.0981e-04,
3.3426e-04, 2.6655e-04]],
[[-5.4932e-04, 1.7262e-04, 2.1541e-04, ..., -3.7432e-05,
-9.8419e-04, 1.6880e-04]]]], device='cuda:0', dtype=torch.float16), tensor([[[[ 5.9938e-04, -3.9625e-04, 5.4741e-04, ..., 4.3201e-04,
-1.3793e-04, -1.9181e-04]],
[[ 3.7742e-04, -5.5695e-04, -2.9135e-04, ..., 1.7281e-03,
1.7965e-04, 5.9700e-04]],
[[ 1.0926e-04, 2.3544e-05, 5.3358e-04, ..., -2.4033e-04,
1.6201e-04, -4.3440e-04]],
...,
[[ 1.7214e-04, 5.2261e-04, -4.9210e-04, ..., 5.0163e-04,
-3.2663e-04, -6.6233e-04]],
[[-2.8086e-04, -1.4067e-04, -1.6558e-04, ..., -2.5988e-04,
4.1366e-04, 3.3021e-04]],
[[-6.7997e-04, 2.1374e-04, 2.6679e-04, ..., -4.6372e-05,
-1.2188e-03, 2.0909e-04]]]], device='cuda:0', dtype=torch.float16)), (tensor([[[[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan]]]], device='cuda:0',
dtype=torch.float16), tensor([[[[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan]]]], device='cuda:0',
(Yeah I'll reinstall with pip insatall . post logs)
might be sth relevant to the lates cuda stream support, let me check in next days.
@joey00072 I made a fix to cuda stream, would you mind try again?
YEP,
hey if possible can you push latest to pypi, build notebook (pod) crashes in runpod while building from source. also is costs
hey this https://gist.github.com/joey00072/cc931453ba48884fb8ce8dc75dfbf390 This are build logs stuck with this error
raise RuntimeError(message)
RuntimeError: Cannot find libraries: ['libtvm.so', 'libtvm_runtime.so', '3rdparty/cutlass_fpA_intB_gemm/cutlass_kernels/libfpA_intB_gemm.so', '3rdparty/libflash_attn/src/libflash_attn.so']
List of candidates:
I colleted this ones from https://lightning.ai, free tier on L4 gpu. aplolgies for being late
Looks like you should upgrade your cmake version? CMake 3.18 or higher is required. You are running version 3.16.3
Yeah, just saw it after pasting, build is starting but whole thing crashes before build completes lol.
if I build it locally on in docker and moved wheel will that work?
i have rtx 1650 in laptop, but running with same docker images as runpod (runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04
)
Hey Its not working
https://gist.github.com/joey00072/d55f1a8d2f5137926222c785ca26808d
I tried to build manually but make -j
get stuck for 3hr+
@joey00072
Indeed, it is unusual for TVM compilation to take that long, as it typically only requires several minutes on my 24-core CPU.
It appears that your CUDA version is 12.1, which should allow you to directly utilize bitblas from PyPI, as I also tested the latest package under CUDA 12.1.
hi @joey00072 , I refactored some codes for bitnet, the output of eval_ppl.py
is not NaN under my env:
Yeah I got working too Its giving wrong results on volta series but on ampere works fine,
Thanks @LeiWang1999
Batched inference have negligible throughput is this expected
Also is it possible to stored packed wights 2*int8 and do write custom matmul for it in tvm (I am new to tvm)
yeah absolutely its possible, we have some advanced tutorials for extend, but you should know some basic knowledge about tvm and tensorir.
https://github.com/microsoft/BitBLAS/blob/main/docs/ExtendOperatorsWithDSL.md
time to fix a skill issues I'll gonna try it thanks again
File eval_utils.py is missing added from bitnet_b1_58-3B.
also changed this is forward pass, self.sw is not set
https://github.com/microsoft/BitBLAS/blob/d536ddea210d5c0a97dfb55b4630d944421d13e2/integration/BitNet/utils_quant.py#L144
This this was painful on runpod, with defaulting installation it cant find
libnvrtc.so.12.0
and and when I install it through source in conda env flash attn wont install :sob: , I wish there was simple to setup this.Anyways I am getting NaN in loss while calculating perplexity. Can someone help out here (I will raiser pr if I find any fix)
cc: @LeiWang1999