Update: I think this is caused by running a VM on Unraid. The Ubuntu kernel being used is not quite normal.
When attempting the OPT examples, via either Docker or running locally, I'm getting an error: CUDA error: no kernel image is available for execution on the device. This seems pretty unusual.
Possible causes:
I'm only using 1 GPU (I set the flags: TP=1, PP=1).
The below python script reports cuDNN is not available, but shows pytorch was installed have cuDNN. Conflicting..
I'm using CUDA 11.6 when Energon app wants CUDA 11.3
INFO: Uvicorn running on http://0.0.0.0:8020 (Press CTRL+C to quit)
INFO colossalai - uvicorn.error - INFO: Uvicorn running on http://0.0.0.0:8020 (Press CTRL+C to quit)
[09/10/22 19:24:25] INFO colossalai - energon - INFO: ==> Rank 0 built layer 0-12 / total 12
INFO colossalai - energon - INFO: Rank0/0 model size = 0.327696384 GB
INFO: 127.0.0.1:36218 - "GET /docs HTTP/1.1" 200 OK
INFO: 127.0.0.1:36218 - "GET /openapi.json HTTP/1.1" 200 OK
[09/10/22 19:24:33] INFO colossalai - opt_server - INFO: 127.0.0.1:36218 - "POST /generation" - max_tokens=64 prompt='Question: Where were the 2004 Olympics held?\nAnswer: Athens,
Greece\n\nQuestion: What is the longest river on the earth?\nAnswer:' top_k=50 top_p=0.5 temperature=0.7
On WorkerInfo(id=0, name=wok0):
RuntimeError('CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.')
Traceback (most recent call last):
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function
result = python_udf.func(*python_udf.args, **python_udf.kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/engine/rpc_utils.py", line 8, in call_method
return method(rref.local_value(), *args, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/engine/rpc_worker.py", line 118, in run
output, cur_key = self.model.run(key, inputs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/engine/pipeline_wrapper.py", line 72, in run
return self.run_without_pp(key, inputs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/engine/pipeline_wrapper.py", line 86, in run_without_pp
output = self.model(hidden_states=None, **sample)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/model/model_factory.py", line 114, in forward
hidden_states = block(hidden_states=hidden_states,
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/model/endecoder.py", line 56, in forward
hidden_states = residual + self.attn(hidden_states = hidden_states,
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/model/attention.py", line 84, in forward
q = self.query_(hidden_states)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/nn/layer/parallel_1d/layers.py", line 302, in forward
output_parallel = F.linear(input_parallel, self.weight, bias)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/kastan/ai/EnergonAI/examples/opt/executor.py", line 36, in _start
outputs = self.engine.run(inputs).to_here()
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 220, in _handle_exception
raise result.exception_type(result.msg.encode("utf-8").decode("unicode_escape"))
RuntimeError: On WorkerInfo(id=0, name=wok0):
RuntimeError('CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.')
Traceback (most recent call last):
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function
result = python_udf.func(*python_udf.args, **python_udf.kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/engine/rpc_utils.py", line 8, in call_method
return method(rref.local_value(), *args, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/engine/rpc_worker.py", line 118, in run
output, cur_key = self.model.run(key, inputs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/engine/pipeline_wrapper.py", line 72, in run
return self.run_without_pp(key, inputs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/engine/pipeline_wrapper.py", line 86, in run_without_pp
output = self.model(hidden_states=None, **sample)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/model/model_factory.py", line 114, in forward
hidden_states = block(hidden_states=hidden_states,
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/model/endecoder.py", line 56, in forward
hidden_states = residual + self.attn(hidden_states = hidden_states,
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/model/attention.py", line 84, in forward
q = self.query_(hidden_states)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kastan/utils/miniconda3/envs/energonai/lib/python3.9/site-packages/energonai/nn/layer/parallel_1d/layers.py", line 302, in forward
output_parallel = F.linear(input_parallel, self.weight, bias)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
My system information:
❯ python collect_env.py
Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.7.99
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 515.65.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.1
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] numpy 1.23.1 py38h6c91a56_0
[conda] numpy-base 1.23.1 py38ha15fc14_0
[conda] pytorch 1.12.1 py3.8_cuda11.3_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.12.1 py38_cu113 pytorch
[conda] torchvision 0.13.1 py38_cu113 pytorch
Update: I think this is caused by running a VM on Unraid. The Ubuntu kernel being used is not quite normal.
When attempting the OPT examples, via either Docker or running locally, I'm getting an error:
CUDA error: no kernel image is available for execution on the device
. This seems pretty unusual.Possible causes:
TP=1, PP=1
).cuDNN
is not available, but shows pytorch was installed have cuDNN. Conflicting..Any debugging advice? Thanks!
My system information: