Open AlbertoCastelo opened 2 days ago
Hi @AlbertoCastelo, thanks for your bug report! We confirmed that is because CUDA 12.2 does not have cudafp16.h. We will fix it soon.
@AlbertoCastelo I was facing a similar error on the same lines of code:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/limits(344): error: floating constant is out of range
_LIBCUDACXX_INLINE_VISIBILITY static _LIBCUDACXX_CONSTEXPR type denorm_min() _NOEXCEPT {return __FLT_DENORM_MIN__;}
^
/usr/local/cuda/include/cuda/std/detail/libcxx/include/limits(396): error: floating constant is out of range
_LIBCUDACXX_INLINE_VISIBILITY static _LIBCUDACXX_CONSTEXPR type denorm_min() _NOEXCEPT {return __DBL_DENORM_MIN__;}
^
2 errors detected in the compilation of "sourceCode.cu".
CUDA error code=6(b'NVRTC_ERROR_COMPILATION')
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xgrammar/kernels/apply_token_bitmask_inplace_cuda.py", line 170, in compile
checkCudaErrors(nvrtc.nvrtcCompileProgram(prog, len(opts), opts))
File "/usr/local/lib/python3.10/dist-packages/xgrammar/kernels/apply_token_bitmask_inplace_cuda.py", line 88, in checkCudaErrors
raise RuntimeError(
RuntimeError: CUDA error code=6(b'NVRTC_ERROR_COMPILATION')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/ebnf_xgrammar.py", line 33, in <module>
generated_ids = model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2215, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3223, in _sample
next_token_scores = logits_processor(input_ids, next_token_logits)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/logits_process.py", line 104, in __call__
scores = processor(input_ids, scores)
File "/usr/local/lib/python3.10/dist-packages/xgrammar/contrib/hf.py", line 95, in __call__
xgr.apply_token_bitmask_inplace(scores, self.token_bitmask.to(scores.device))
File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 110, in apply_token_bitmask_inplace
apply_token_bitmask_inplace_cuda(logits, bitmask, indices)
File "/usr/local/lib/python3.10/dist-packages/xgrammar/kernels/apply_token_bitmask_inplace_cuda.py", line 235, in apply_token_bitmask_inplace_cuda
kernel = KernelStore.compile(logits.device.index)
File "/usr/local/lib/python3.10/dist-packages/xgrammar/kernels/apply_token_bitmask_inplace_cuda.py", line 178, in compile
raise RuntimeError("CUDA kernel compilation failure")
RuntimeError: CUDA kernel compilation failure
The solution was to install the NVIDIA CUDA Toolkit 12.4 with the proper drivers that support it, in my specific case: 550.90.12.
I hope it helps!
Hi @roG0d, you are right, this problem is caused by an incompatible cuda-python compilation workflow with CUDA version prior to 12.4.
We just released a new version where we switch to a triton kernel implementation and avoid using cuda-python. This should be compatible with all CUDA versions. Please try
pip install xgrammar==v0.1.5.rc1
and the problem should be fixed.
We will release v0.1.5.rc1 first and if no problem occurs, we will release v0.1.5 later today.
@AlbertoCastelo, this should fix your problem as well!
I got the following error with xgrammar==v0.1.5.rc1:
/tmp/tmpamzpmugc/main.c:5:10: fatal error: Python.h: No such file or directory
5 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
I got the following error with xgrammar==v0.1.5.rc1:
/tmp/tmpamzpmugc/main.c:5:10: fatal error: Python.h: No such file or directory 5 | #include <Python.h> | ^~~~~~~~~~ compilation terminated.
I just had to install python header files, everything is working now :-):
sudo apt update
sudo apt install python3.11-dev
Hi @zcasanova, thanks for your feedback. That is a requirement for building XGrammar. But to run XGrammar on the Python side, that is not necessary.
hey folks!
Running the example fails at
model.generate(...)
with the following stacktraceEnvironment
Cuda
I also installed after some initial errors.
pip freeze