microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
359 stars 29 forks source link

Bitnet is giving NaN for perplexity #26

Closed joey00072 closed 4 months ago

joey00072 commented 5 months ago

image

  torch==2.3.0+cu121
  torchaudio==2.3.0+cu121
  torchvision==0.18.0+cu121
  nvidia-cuda-cupti-cu12==12.1.105
  nvidia-cuda-nvrtc-cu12==12.1.105
  nvidia-cuda-runtime-cu12==12.1.105
  bitblas==0.0.1.dev2

File eval_utils.py is missing added from bitnet_b1_58-3B.

also changed this is forward pass, self.sw is not set

sw = 1 / self.weight.abs().mean().clamp(min=1e-5)

https://github.com/microsoft/BitBLAS/blob/d536ddea210d5c0a97dfb55b4630d944421d13e2/integration/BitNet/utils_quant.py#L144


This this was painful on runpod, with defaulting installation it cant find libnvrtc.so.12.0 and and when I install it through source in conda env flash attn wont install :sob: , I wish there was simple to setup this.

Anyways I am getting NaN in loss while calculating perplexity. Can someone help out here (I will raiser pr if I find any fix)

cc: @LeiWang1999

LeiWang1999 commented 5 months ago

Hi @joey00072 Yeah. We currently provide pre-built .whl files only for CUDA 12.0.

The requirement for flash atten originates from BitNet, not this repository. Maybe we should add a note about this in the README under the BitNet folder.

It's surprising that eval_ppl failed. Our tests on the BitNet-BitBLAS integration were conducted only using eval_correctness.py. I will investigate this issue lately.

LeiWang1999 commented 5 months ago

@joey00072 the self.sw was assigned in https://github.com/microsoft/BitBLAS/blob/d536ddea210d5c0a97dfb55b4630d944421d13e2/integration/BitNet/utils_quant.py#L89

as the sw shouldn't be calculate at the online inference stage, we use a post_process to intialize the results.

Take a look at the example usage: https://github.com/microsoft/BitBLAS/blob/main/integration/BitNet/eval_correctness.py#L47-L48

Would you mind check out with the post_process again?

joey00072 commented 5 months ago

Yes, I did this also, I was stilling getting nan image

LeiWang1999 commented 5 months ago

you should do post_process before bitblas_matmul I guess.

LeiWang1999 commented 5 months ago

hi @joey00072 did this resolve your issue?

joey00072 commented 5 months ago

Nope, Currently jumping through dependency hell.

when i install it from pip

Installing collected packages: bitblas
Successfully installed bitblas-0.0.1.dev2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

root@nabki9esff:/notebooks/BitBLAS/integration/BitNet# python eval_correctness.py 
Traceback (most recent call last):
  File "/notebooks/BitBLAS/integration/BitNet/eval_correctness.py", line 7, in <module>
    from modeling_bitnet import BitnetForCausalLM
  File "/notebooks/BitBLAS/integration/BitNet/modeling_bitnet.py", line 52, in <module>
    from utils_quant import BitLinear
  File "/notebooks/BitBLAS/integration/BitNet/utils_quant.py", line 9, in <module>
    import bitblas
  File "/usr/local/lib/python3.9/dist-packages/bitblas/__init__.py", line 19, in <module>
    from . import gpu  # noqa: F401
  File "/usr/local/lib/python3.9/dist-packages/bitblas/gpu/__init__.py", line 7, in <module>
    from .fallback import Fallback  # noqa: F401
  File "/usr/local/lib/python3.9/dist-packages/bitblas/gpu/fallback.py", line 25, in <module>
    from tvm import tir
  File "/usr/local/lib/python3.9/dist-packages/bitblas/3rdparty/tvm/python/tvm/__init__.py", line 26, in <module>
    from ._ffi.base import TVMError, __version__, _RUNTIME_ONLY
  File "/usr/local/lib/python3.9/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/__init__.py", line 28, in <module>
    from .base import register_error
  File "/usr/local/lib/python3.9/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/base.py", line 78, in <module>
    _LIB, _LIB_NAME = _load_lib()
  File "/usr/local/lib/python3.9/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/base.py", line 64, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.9/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvrtc.so.12: cannot open shared object file: No such file or directory
root@nabki9esff:/notebooks/BitBLAS/integration/BitNet# 

When I install it from source

    _LIB, _LIB_NAME = _load_lib()
  File "/notebooks/BitBLAS/python/bitblas/../../3rdparty/tvm/python/tvm/_ffi/base.py", line 58, in _load_lib
    lib_path = libinfo.find_lib_path()
  File "/notebooks/BitBLAS/python/bitblas/../../3rdparty/tvm/python/tvm/_ffi/libinfo.py", line 166, in find_lib_path
    raise RuntimeError(message)
RuntimeError: Cannot find libraries: ['libtvm.so', 'libtvm_runtime.so', '3rdparty/cutlass_fpA_intB_gemm/cutlass_kernels/libfpA_intB_gemm.so', '3rdparty/libflash_attn/src/libflash_attn.so']
List of candidates:
LeiWang1999 commented 5 months ago

would you mind provide the install logs when you type pip install . in the root directory?

joey00072 commented 5 months ago

hey even eval_corretess.py giving nans

         ...,

         [[ 2.4748e-04,  4.9496e-04, -7.1383e-04,  ...,  7.1955e-04,
           -5.0783e-04, -9.5367e-04]],

         [[-3.7241e-04, -1.9264e-04, -2.5892e-04,  ..., -3.6144e-04,
            6.3705e-04,  4.5371e-04]],

         [[-1.0176e-03,  3.1352e-04,  3.9411e-04,  ..., -6.5327e-05,
           -1.7977e-03,  2.9659e-04]]]], device='cuda:0', dtype=torch.float16)), (tensor([[[[ 4.8399e-04, -3.2020e-04,  4.4203e-04,  ...,  3.4881e-04,
           -1.1134e-04, -1.5485e-04]],

         [[ 3.0470e-04, -4.4990e-04, -2.3532e-04,  ...,  1.3952e-03,
            1.4520e-04,  4.8208e-04]],

         [[ 8.8215e-05,  1.9014e-05,  4.3082e-04,  ..., -1.9407e-04,
            1.3077e-04, -3.5095e-04]],

         ...,

         [[ 1.3900e-04,  4.2200e-04, -3.9721e-04,  ...,  4.0531e-04,
           -2.6393e-04, -5.3501e-04]],

         [[-2.2686e-04, -1.1355e-04, -1.3375e-04,  ..., -2.0981e-04,
            3.3426e-04,  2.6655e-04]],

         [[-5.4932e-04,  1.7262e-04,  2.1541e-04,  ..., -3.7432e-05,
           -9.8419e-04,  1.6880e-04]]]], device='cuda:0', dtype=torch.float16), tensor([[[[ 5.9938e-04, -3.9625e-04,  5.4741e-04,  ...,  4.3201e-04,
           -1.3793e-04, -1.9181e-04]],

         [[ 3.7742e-04, -5.5695e-04, -2.9135e-04,  ...,  1.7281e-03,
            1.7965e-04,  5.9700e-04]],

         [[ 1.0926e-04,  2.3544e-05,  5.3358e-04,  ..., -2.4033e-04,
            1.6201e-04, -4.3440e-04]],

         ...,

         [[ 1.7214e-04,  5.2261e-04, -4.9210e-04,  ...,  5.0163e-04,
           -3.2663e-04, -6.6233e-04]],

         [[-2.8086e-04, -1.4067e-04, -1.6558e-04,  ..., -2.5988e-04,
            4.1366e-04,  3.3021e-04]],

         [[-6.7997e-04,  2.1374e-04,  2.6679e-04,  ..., -4.6372e-05,
           -1.2188e-03,  2.0909e-04]]]], device='cuda:0', dtype=torch.float16)), (tensor([[[[nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan]],

         ...,

         [[nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0',
       dtype=torch.float16), tensor([[[[nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan]],

         ...,

         [[nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0',

(Yeah I'll reinstall with pip insatall . post logs)

LeiWang1999 commented 5 months ago

might be sth relevant to the lates cuda stream support, let me check in next days.

LeiWang1999 commented 5 months ago

@joey00072 I made a fix to cuda stream, would you mind try again?

joey00072 commented 5 months ago

YEP,

hey if possible can you push latest to pypi, build notebook (pod) crashes in runpod while building from source. also is costs

joey00072 commented 5 months ago

hey this https://gist.github.com/joey00072/cc931453ba48884fb8ce8dc75dfbf390 This are build logs stuck with this error

    raise RuntimeError(message)
RuntimeError: Cannot find libraries: ['libtvm.so', 'libtvm_runtime.so', '3rdparty/cutlass_fpA_intB_gemm/cutlass_kernels/libfpA_intB_gemm.so', '3rdparty/libflash_attn/src/libflash_attn.so']
List of candidates:

I colleted this ones from https://lightning.ai, free tier on L4 gpu. aplolgies for being late

LeiWang1999 commented 5 months ago

Looks like you should upgrade your cmake version? CMake 3.18 or higher is required. You are running version 3.16.3

joey00072 commented 5 months ago

Yeah, just saw it after pasting, build is starting but whole thing crashes before build completes lol.

if I build it locally on in docker and moved wheel will that work? i have rtx 1650 in laptop, but running with same docker images as runpod (runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04)

joey00072 commented 4 months ago

Hey Its not working https://gist.github.com/joey00072/d55f1a8d2f5137926222c785ca26808d I tried to build manually but make -j get stuck for 3hr+

LeiWang1999 commented 4 months ago

@joey00072

Indeed, it is unusual for TVM compilation to take that long, as it typically only requires several minutes on my 24-core CPU.

It appears that your CUDA version is 12.1, which should allow you to directly utilize bitblas from PyPI, as I also tested the latest package under CUDA 12.1.

LeiWang1999 commented 4 months ago

hi @joey00072 , I refactored some codes for bitnet, the output of eval_ppl.py is not NaN under my env:

image
joey00072 commented 4 months ago

Yeah I got working too Its giving wrong results on volta series but on ampere works fine, image image

Thanks @LeiWang1999


Batched inference have negligible throughput is this expected

image

joey00072 commented 4 months ago

Also is it possible to stored packed wights 2*int8 and do write custom matmul for it in tvm (I am new to tvm)

similar to https://github.com/astramind-ai/BitMat/tree/main

LeiWang1999 commented 4 months ago

yeah absolutely its possible, we have some advanced tutorials for extend, but you should know some basic knowledge about tvm and tensorir.

https://github.com/microsoft/BitBLAS/blob/main/docs/ExtendOperatorsWithDSL.md

joey00072 commented 4 months ago

time to fix a skill issues I'll gonna try it thanks again