mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
151 stars 8 forks source link

Error on building ops, "CUDART_MAX_NORMAL_FP16" is undefined #7

Open Zhuohao-Li opened 1 month ago

Zhuohao-Li commented 1 month ago

Hi,

I tried to build on my own but something weird happens when I compile kernels and build end-to-end operators with PyBind. The error comes both when make -j and bash setup.sh when link the ops.

Here is the details to reproduce it:

CMD:

(1)

cd kernels
mkdir build && cd build
cmake ..
make -j

(2)

cd quest/ops
bash setup.sh

log:

for (1) when compilation

[ 71%] Building CUDA object 3rdparty/nvbench/exec/CMakeFiles/nvbench.ctl.dir/nvbench-ctl.cu.o
/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(516): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
     local_max.fill(-CUDART_MAX_NORMAL_FP16);
                     ^

[ 73%] Linking CXX shared library ../../../lib/libgmock.so
[ 73%] Built target gmock
[ 74%] Building CXX object 3rdparty/googletest/googlemock/CMakeFiles/gmock_main.dir/src/gmock_main.cc.o
/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

for (2) when linking

/home/ubuntu/zhuohao-dev-3/quest/quest/ops/../../kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

/home/ubuntu/zhuohao-dev-3/quest/quest/ops/../../kernels/include/decode/decode_page.cuh(516): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
     local_max.fill(-CUDART_MAX_NORMAL_FP16);

I make sure the include <cuda_fp16.h> is included in quest/kernels/include/decode/decode_page.cuh

Devices

Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal NVIDIA Driver: 535.183.01 CUDA:12.1 cmake: 3.26.4 A100-SXM4-40GB env var:

export PATH="/usr/local/cuda/bin:$PATH"
export PATH="/home/ubuntu/.local/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib"
export CUDA_INSTALL_PATH="/usr/local/cuda"
export CUDA_HOME="/usr/local/cuda"
export CC="/usr/bin/gcc-11"
export CXX="/usr/bin/g++-11"

I did not find CUDART_MAX_NORMAL_FP16 in cuda_fp16.hpp, can you please check with that? Or if I miss something, thanks!

yangqy1 commented 4 weeks ago

I've encountered the same issue. Have you found a solution yet?

happierpig commented 3 weeks ago

Hi @Zhuohao-Li and @yangqy1 ,

Thanks for your interest in our project!!

CUDART_MAX_NORMAL_FP16 seems to be introduced by CUDA 12.4 (used in our experiments). Check doc for details. It's also okay to directly replace this macro definition with the correct constant value as a quick fix.

Hope this can solve your issues.

yangqy1 commented 3 weeks ago

Hi @happierpig and @Zhuohao-Li ,

Thank you for your prompt response and helpful suggestions!

I successfully ran the quest/scripts/example_textgen.py using CUDA version 11.8 with an A800 GPU. Despite the issues I mentioned, I encountered two additional problems and found solutions for them as follows:

  1. Regarding the missing CUDART_MAX_NORMAL_FP16:

    • For the setup commands:
      cd quest/ops
      bash setup.sh

      I added #define CUDART_MAX_NORMAL_FP16 __ushort_as_half((unsigned short)0x7BFFU) right after #include <cuda_fp16.h> in quest/kernels/include/decode/decode_page.cuh.

    • For the optional build commands:
      cd kernels
      mkdir build && cd build
      cmake ..
      make -j

      In quest/kernels/src/test/test_page.cu, I inserted half fill_value = __float2half(-65504.0f); and replaced CUDART_MAX_NORMAL_FP16 with fill_value.

  2. When running the tests in quest/kernels/build, I encountered the error:

    Fail: Unexpected error: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.

    Since my GPU is an A800 and the original code was compiled for an RTX 4090, I changed the compile-time parameters: I modified set(CMAKE_CUDA_ARCHITECTURES 89) to set(CMAKE_CUDA_ARCHITECTURES 80) in both quest/kernels/CMakeLists.txt and quest/quest/ops/CMakeLists.txt to match my GPU's capabilities. This resolved the issue after recompilation.

  3. When executing quest/scripts/example_textgen.py, I faced a CUDA error:

    RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

    By adding torch.cuda.set_device("cuda:0") and specifying the device as device="cuda:0" during model.quest_init(), I resolved the issue.

I hope this detailed explanation can help others facing similar issues!