Open Zhuohao-Li opened 1 month ago
I've encountered the same issue. Have you found a solution yet?
Hi @Zhuohao-Li and @yangqy1 ,
Thanks for your interest in our project!!
CUDART_MAX_NORMAL_FP16
seems to be introduced by CUDA 12.4 (used in our experiments). Check doc for details. It's also okay to directly replace this macro definition with the correct constant value as a quick fix.
Hope this can solve your issues.
Hi @happierpig and @Zhuohao-Li ,
Thank you for your prompt response and helpful suggestions!
I successfully ran the quest/scripts/example_textgen.py
using CUDA version 11.8 with an A800 GPU. Despite the issues I mentioned, I encountered two additional problems and found solutions for them as follows:
Regarding the missing CUDART_MAX_NORMAL_FP16
:
cd quest/ops
bash setup.sh
I added #define CUDART_MAX_NORMAL_FP16 __ushort_as_half((unsigned short)0x7BFFU)
right after #include <cuda_fp16.h>
in quest/kernels/include/decode/decode_page.cuh
.
cd kernels
mkdir build && cd build
cmake ..
make -j
In quest/kernels/src/test/test_page.cu
, I inserted half fill_value = __float2half(-65504.0f);
and replaced CUDART_MAX_NORMAL_FP16
with fill_value
.
When running the tests in quest/kernels/build
, I encountered the error:
Fail: Unexpected error: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.
Since my GPU is an A800 and the original code was compiled for an RTX 4090, I changed the compile-time parameters:
I modified set(CMAKE_CUDA_ARCHITECTURES 89)
to set(CMAKE_CUDA_ARCHITECTURES 80)
in both quest/kernels/CMakeLists.txt
and quest/quest/ops/CMakeLists.txt
to match my GPU's capabilities. This resolved the issue after recompilation.
When executing quest/scripts/example_textgen.py
, I faced a CUDA error:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
By adding torch.cuda.set_device("cuda:0")
and specifying the device as device="cuda:0"
during model.quest_init()
, I resolved the issue.
I hope this detailed explanation can help others facing similar issues!
Hi,
I tried to build on my own but something weird happens when I compile kernels and build end-to-end operators with PyBind. The error comes both when
make -j
andbash setup.sh
when link the ops.Here is the details to reproduce it:
CMD:
(1)
(2)
log:
for (1) when compilation
for (2) when linking
I make sure the
include <cuda_fp16.h>
is included inquest/kernels/include/decode/decode_page.cuh
Devices
Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal NVIDIA Driver: 535.183.01 CUDA:12.1 cmake: 3.26.4 A100-SXM4-40GB env var:
I did not find
CUDART_MAX_NORMAL_FP16
incuda_fp16.hpp
, can you please check with that? Or if I miss something, thanks!