RuntimeError: CUDA error: no kernel image is available for execution on the device

khazamaa commented 2 years ago

🐛 Describe the bug

I am deploying a model using torchserve. It was easily deployed on Tesla K80 GPU. But now when i shifted it to newer GPU Nvidia A30. I am getting this error

torch_version: '1.12.0+cu113'
cudnn_version: 8302

new

cc @ngimel

khazamaa commented 2 years ago

PFA Nvidia-smi Screenshot

ptrblck commented 2 years ago

@khazamaa This error is raised if your used PyTorch build does not ship for the right GPU architecture (in your case for your Ampere GPU). Is torch.version.cuda returning 11.3? If so, are you building custom CUDA extensions and might not have specified the right architectures?

khazamaa commented 2 years ago

@ptrblck torch.version.cuda is returning 11.3 and I'm not using any custom CUDA extension.

khazamaa commented 2 years ago

I am using the docker image by running ./build_image.sh -g -cv cu113 from the torchserve repo

khazamaa commented 2 years ago

Collecting environment information...
PyTorch version: 1.12.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.27

Python version: 3.8.0 (default, Dec  9 2021, 17:53:27)  [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-29-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA A30
Nvidia driver version: 510.39.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.1
[pip3] torch==1.12.0+cu113
[pip3] torch-model-archiver==0.6.0
[pip3] torchserve==0.6.0
[pip3] torchtext==0.13.0
[pip3] torchvision==0.13.0+cu113
[conda] Could not collect

Extra information about the whole environment.

ptrblck commented 2 years ago

OK, that's strange. Could you post a minimal, executable code snippet you are running inside this container so I could try to reproduce it?

ngimel commented 2 years ago

Also, can you run with CUDA_LAUNCH_BLOCKING=1 to get the exact stack trace of which op is failing.

ptrblck commented 2 years ago

That's a good point.

I've tried to reproduce your build via ./build_image.sh -g -cv cu113 and see that the expected PyTorch binary is installed from here.

Inside the built container I'm also able to use the (Ampere) GPU:

model-server@e9796f70a618:~$ ls
config.properties  model-store  tmp
model-server@e9796f70a618:~$ python
Python 3.8.0 (default, Dec  9 2021, 17:53:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.12.0+cu113'
>>> torch.cuda.get_arch_list()
['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
>>> torch.randn(1).cuda()
tensor([-1.4917], device='cuda:0')

khazamaa commented 2 years ago

@ptrblck you can check by building this project in the container.

OK, that's strange. Could you post a minimal, executable code snippet you are running inside this container so I could try to reproduce it?

khazamaa commented 2 years ago

Also, can you run with CUDA_LAUNCH_BLOCKING=1 to get the exact stack trace of which op is failing.

@ngimel I added this environment variable in my container but the logs were still the same, can you tell what I'm doing wrong.

ptrblck commented 2 years ago

@khazamaa Your instructions are unfortunately not working and I cannot execute your workload. If I stick to your instructions it seems the model cannot be found in torchserve:

# torchserve output
model-server@91fc004ce231:~$ torchserve --start --ncs --model-store model-store --ts-config config.properties --foreground
Removing orphan pid file.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-07-25T21:42:55,963 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-07-25T21:42:56,145 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.6.0
TS Home: /home/venv/lib/python3.8/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Number of GPUs: 1
Number of CPUs: 32
Max heap size: 16056 M
Python executable: /home/venv/bin/python
Config file: config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: N/A

(Note the Initial Modes: N/A field)

If I run it via:

torchserve --start --ncs --model-store model-store --ts-config config.properties --foreground --models all

the model is detected but fails with missing dependencies:

2022-07-25T21:47:30,045 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG - Python runtime: 3.8.0
2022-07-25T21:47:30,045 [INFO ] W-9000-gfpgan_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2022-07-25T21:47:30,046 [INFO ] W-9000-gfpgan_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1658785650046
2022-07-25T21:47:30,046 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2022-07-25T21:47:30,054 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG - model_name: gfpgan, batchSize: 1
2022-07-25T21:47:30,495 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG - Backend worker process died.
2022-07-25T21:47:30,495 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-07-25T21:47:30,495 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.8/site-packages/ts/model_loader.py", line 100, in load
2022-07-25T21:47:30,495 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.8/site-packages/ts/model_loader.py", line 162, in _load_handler_file
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 991, in _find_and_load
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
2022-07-25T21:47:30,496 [INFO ] epollEventLoopGroup-5-7 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2022-07-25T21:47:30,496 [DEBUG] W-9000-gfpgan_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 783, in exec_module
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/20cb6adc5b6d478180a5f35e423f844b/generative.py", line 16, in <module>
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG -     from basicsr.utils import img2tensor, tensor2img
2022-07-25T21:47:30,496 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'basicsr'

After installing basicsr, I see another failure:

2022-07-25T21:51:21,156 [INFO ] W-9000-gfpgan_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'examples'

However, the resnet-18 example from torchserve works properly in the built container on an RTX 3090:

# launch the container
docker run -it --ipc=host --gpus all -p 8080:8080 -p 8081:8081 docker.io/pytorch/torchserve:latest-gpu bash

# start torchserve (make sure resnet-18.mar is in the right folder
torchserve --start --ncs --model-store model-store --models resnet-18=resnet-18.mar --foreground

# in another terminal check that torchserve is healthy
curl http://localhost:8080/ping
{
  "status": "Healthy"
}

# send request
curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_classifier/kitten.jpg
{
  "tabby": 0.4097650349140167,
  "tiger_cat": 0.34653547406196594,
  "Egyptian_cat": 0.1300436556339264,
  "lynx": 0.023934513330459595,
  "bucket": 0.01154948491603136

# check if GPU is used
curl http://127.0.0.1:8081/models/resnet-18
[
  {
    "modelName": "resnet-18",
    "modelVersion": "1.0",
    "modelUrl": "resnet-18.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "loadedAtStartup": true,
    "workers": [
      {
        "id": "9000",
        "startTime": "2022-07-25T22:27:57.946Z",
        "status": "READY",
        "memoryUsage": 0,
        "pid": 52,
        "gpu": true,
        "gpuUsage": "gpuId::0 utilization.gpu [%]::2 % utilization.memory [%]::0 % memory.used [MiB]::12887 MiB"
      }
    ]
  }
]

Btw. this issue is also tracked in https://github.com/pytorch/serve/issues/1754 from another user in case @msaroufim is also debugging it.

khazamaa commented 2 years ago

@ptrblck you can run the above repo by

Downloading the weight file GFPGANv1_ema.pth from https://drive.google.com/file/d/1HVYp54rAx7RIyYh5wx2Dd-xDYwysJ9Oh/view?usp=sharing

Put it at ./examples/toonme/GFPGANv1_ema.pth and inside docker folder in config.properties you can add the line install_py_dep_per_model=true so basicsr not found error will go away.

Also add model-store folder in home of the repo, and then run the container by

docker run --rm --gpus all \
        -p8080:8080 \
        -p8081:8081 \
        -p8082:8082 \
        -p7070:7070 \
        -p7071:7071 \
        -v $(pwd)/model-store:/home/model-server/model-store  -v $(pwd)/examples:/home/model-server/examples -v $(pwd)/ts:/home/model-server/ts pytorch/torchserve:latest-gpu torchserve --model-store=/home/model-server/model-store

and then this code for making the mar file

torch-model-archiver --model-name gfpgan --version 1.0 --model-file examples/toonme/gfpgan/gfpganv1_arch.py --serialized-file examples/toonme/GFPGANv1_ema.pth --export-path ./model-store --handler ts/torch_handler/generative.py  --extra-files examples/toonme/gfpgan/utils.py,examples/toonme/gfpgan/stylegan2_arch.py --requirements-file examples/toonme/requirements.txt -f

khazamaa commented 2 years ago

then for registering the model you can run this curl command from outside.

curl -X POST  "http://localhost:8081/models?url=gfpgan.mar&batch_size=4&max_batch_delay=500&min_worker=1" && curl -v -X PUT "http://localhost:8081/models/gfpgan?min_worker=1"

ptrblck commented 2 years ago

After adding the install_py_dep_per_model=true some dependencies seem to be installed as the torchserve cmd blocks for a while until it fails with:

ModuleNotFoundError: No module named 'examples'

Note that I've also mounted the examples folder to the torchserve container.

In any case, the container seems to work using the resnet example, so at least the right PyTorch binary with the CUDA 11.3 runtime is installed. Unless you are somehow building a custom extension I don't know how your use case could fail. Let me know if you can provide instructions to reproduce the issue.

hosein-cnn commented 1 year ago

Hi, I am also facing this problem. But I don't use Pytorch, I just want to write simple Cuda C++ code in VS.

I use the following :

Windows 10 , 19044(21H2)
Visual Studio 2019 or 2022
Nvidia GeForce GTX 960m , Maxwell , Capability 5.0
Cuda Toolkit 11.7 or 11.8 or 12.0

After I installed all the packages related to C++ on VS2019, I installed Cuda Toolkit. When I run sample code , I get the following error :

No kernel image is available for execution on the device.

I even ran Cuda 11.7, 11.8 and 12.0 on VS2019 and VS2022, But the error still exists. I am facing this error for 20 days and I am really fed up. also deviceQuery.exe and nvidia-smi and nvcc --version runs fine. I have also checked all the nvidia and other sites and my GPU has no problem with the Cuda version. What factors can cause this error ? What could be the reasons for this error?

please help me out , thanks all.

guillem93mm commented 1 year ago

Hey! Perhaps someone can kindly give me a hand. My GPU is NVIDIA GEOFORCE RTx 3050, running Cuda 12.0, Ubuntu 22.04 and Pytorch is Version: 1.12.0+cu116. I am running the rastervision pipeline in a docker container with the following command:

sudo docker run --rm --runtime=nvidia --gpus all  -it     -v ${RV_QUICKSTART_CODE_DIR}:/opt/src/code      -v ${RV_QUICKSTART_OUT_DIR}:/opt/data/output     quay.io/azavea/raster-vision:pytorch-0.20 /bin/bash

And obtain the following error

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

When I run nvidia-smi I can clearly see it is working. Would very much appreciate some light on it. Thanks a lot!

Toap777 commented 1 year ago

Hey the same issue here ! I tried to get https://github.com/oobabooga/text-generation-webui running using TheBloke/wizard-vicuna-13B-GGML from hugging face. The model should be executed in cpu + gpu inference mode with 11 layers loaded to GPU.

Strangely the model is loaded into memory without any errors, but crashes on generation of text printing this error:

"CUDA error 209 at C:\Users\18-17\Desktop\text_generation_gui\installer_files\pip-install-hsfo6qp0\llama-cpp-python_ba51304327c84fd6a00cff2ed1e9bb26\vendor\llama.cpp\ggml-cuda.cu:2221: no kernel image is available for execution on the device" -> Interestlingly the path doesnt exists in windows explorer

What I have tried so far: I followed the guide at: https://stackoverflow.com/questions/60987997/why-torch-cuda-is-available-returns-false-even-after-installing-pytorch-with to check if cuda is okay

import torch torch.cuda.get_arch_list() -> ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37'] and torch.cuda.is_available() return -> True torch.zeros(1).cuda() -> executed

My GPU supports CUDA compute capas 5.0 according to this list: https://developer.nvidia.com/cuda-gpus#compute what should be represented by "sm_50" arch list.

I tried to fix with this reccomondation: https://github.com/imartinez/privateGPT/discussions/778 but rebuilding pytorch like that it doesnt changed anything.

Reproduction Use the NVIDIA GeForce GTX 750 Ti and install latest device drivers with CUDA.

Install web-ui using the windows installer from https://github.com/oobabooga/text-generation-webui/releases/download/installers/oobabooga_windows.zip. Enable GPU accleration by following: https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration

Install TheBloke/wizard-vicuna-13B-GGML from huggingface using the webgeneration-ui integrated installer.

Set runtime params how provided in INSTRUCTIONS.txt (at webui.py) CMD_FLAGS = '--chat --threads 4 --n-gpu-layers 11' model loader = llama.cpp

System info Gerätename Powerhouse Vollständiger Gerätename
Prozessor Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz 3.31 GHz Installierter RAM 16,0 GB (15,9 GB verwendbar) Geräte-ID 7D6504E4-579D-4A2D-8043-3D5EA8A0280D Produkt-ID 00342-50372-92326-AAOEM Systemtyp 64-Bit-Betriebssystem, x64-basierter Prozessor Stift- und Toucheingabe Für diese Anzeige ist keine Stift- oder Toucheingabe verfügbar.

Grafikarte: GPU 0

NVIDIA GeForce GTX 750 Ti

Treiberversion: 31.0.15.3623
Treiberdatum:   08.06.2023
DirectX-Version:    12 (FL 11.0)
Physischer Standort:    PCI-Bus 1, Gerät 0, Funktion 0

Auslastung  0%
Dedizierter GPU-Speicher    1,9/2,0 GB
Gemeinsamer GPU-Speicher    4,6/7,9 GB
GPU-Speicher    6,5/9,9 GB

Logs and Traces nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0

>nvcc --list-gpu-arch compute_35 compute_37 compute_50 compute_52 compute_53 compute_60 compute_61 compute_62 compute_70 compute_72 compute_75 compute_80 compute_86 compute_87

textgeneration-webui runtime console messages bin C:\Users\18-17\Desktop\text_generation_gui\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117_nocublaslt.dll 2023-06-29 14:05:15 INFO:Loading TheBloke_wizard-vicuna-13B-GGML... ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 750 Ti 2023-06-29 14:05:16 INFO:llama.cpp weights detected: models\TheBloke_wizard-vicuna-13B-GGML\wizard-vicuna-13B.ggmlv3.q2_K.bin

2023-06-29 14:05:16 INFO:Cache capacity is 0 bytes llama.cpp: loading model from models\TheBloke_wizard-vicuna-13B-GGML\wizard-vicuna-13B.ggmlv3.q2_K.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 10 (mostly Q2_K) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 0.09 MB llama_model_load_internal: using CUDA for GPU acceleration llama_model_load_internal: mem required = 5828.89 MB (+ 1608.00 MB per state) llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer llama_model_load_internal: offloading 11 repeating layers to GPU llama_model_load_internal: offloaded 11/43 layers to GPU llama_model_load_internal: total VRAM used: 1908 MB llama_new_context_with_model: kv self size = 1600.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |

PIP package list: (python 3.10) Package Version

accelerate 0.20.3 aiofiles 23.1.0 aiohttp 3.8.4 aiosignal 1.3.1 altair 5.0.1 antlr4-python3-runtime 4.9.3 anyio 3.7.0 asttokens 2.2.1 async-timeout 4.0.2 attrs 23.1.0 auto-gptq 0.2.2+cu117 backcall 0.2.0 beautifulsoup4 4.12.2 bitsandbytes 0.39.1 blinker 1.6.2 certifi 2022.12.7 cffi 1.15.1 charset-normalizer 2.1.1 click 8.1.3 colorama 0.4.6 contourpy 1.1.0 cycler 0.11.0 datasets 2.13.1 decorator 5.1.1 deep-translator 1.9.2 dill 0.3.6 diskcache 5.6.1 docopt 0.6.2 einops 0.6.1 elevenlabs 0.2.18 exceptiongroup 1.1.1 executing 1.2.0 exllama 0.0.4+cu117 fastapi 0.98.0 ffmpeg 1.4 ffmpeg-python 0.2.0 ffmpy 0.3.0 filelock 3.9.0 Flask 2.3.2 flask-cloudflared 0.0.12 flexgen 0.1.7 fonttools 4.40.0 frozenlist 1.3.3 fsspec 2023.6.0 future 0.18.3 gradio 3.33.1 gradio_client 0.2.5 h11 0.14.0 httpcore 0.17.2 httpx 0.24.1 huggingface-hub 0.15.1 idna 3.4 ipython 8.14.0 itsdangerous 2.1.2 jedi 0.18.2 Jinja2 3.1.2 joblib 1.3.0 jsonschema 4.17.3 kiwisolver 1.4.4 linkify-it-py 2.0.2 llama-cpp-python 0.1.67 llvmlite 0.40.1 Markdown 3.4.3 markdown-it-py 2.2.0 MarkupSafe 2.1.2 matplotlib 3.7.1 matplotlib-inline 0.1.6 mdit-py-plugins 0.3.3 mdurl 0.1.2 more-itertools 9.1.0 mpmath 1.2.1 multidict 6.0.4 multiprocess 0.70.14 networkx 3.0 ngrok 0.8.1 nltk 3.8.1 num2words 0.5.12 numba 0.57.1 numpy 1.24.4 omegaconf 2.3.0 openai-whisper 20230314 orjson 3.9.1 packaging 23.1 pandas 2.0.3 parso 0.8.3 peft 0.4.0.dev0 pickleshare 0.7.5 Pillow 9.5.0 pip 23.1.2 prompt-toolkit 3.0.38 psutil 5.9.5 PuLP 2.7.0 pure-eval 0.2.2 pyarrow 12.0.1 pycparser 2.21 pydantic 1.10.9 pydub 0.25.1 Pygments 2.15.1 pyparsing 3.1.0 pyrsistent 0.19.3 python-dateutil 2.8.2 python-multipart 0.0.6 pytz 2023.3 PyYAML 6.0 quant-cuda 0.0.0 regex 2023.6.3 requests 2.31.0 rouge 1.0.1 safetensors 0.3.1 scikit-learn 1.2.2 scipy 1.11.1 semantic-version 2.10.0 sentence-transformers 2.2.2 sentencepiece 0.1.99 setuptools 67.8.0 six 1.16.0 sniffio 1.3.0 soundfile 0.12.1 soupsieve 2.4.1 SpeechRecognition 3.10.0 stack-data 0.6.2 starlette 0.27.0 sympy 1.11.1 threadpoolctl 3.1.0 tiktoken 0.3.1 tokenizers 0.13.3 toolz 0.12.0 torch 2.0.1+cu117 torchaudio 2.0.2+cu117 torchvision 0.15.2+cu117 tqdm 4.65.0 traitlets 5.9.0 transformers 4.30.2 typing_extensions 4.7.0 tzdata 2023.3 uc-micro-py 1.0.2 urllib3 1.26.13 uvicorn 0.22.0 wcwidth 0.2.6 websockets 11.0.2 Werkzeug 2.3.6 wheel 0.38.4 xxhash 3.2.0 yarl 1.9.2

ptrblck commented 1 year ago

@Toap777 Could you install a plain PyTorch binary without any wrappers and WebUIs and check if a simple smoke test would work? We've seen similar issues (described here) were users were complaining about PyTorch not being supported on their GPUs while in fact their install instructions/scripts etc. installed a broken and custom 3rd party library.

j93hahn commented 1 year ago

Hi, I am getting this error with PyTorch 2.0.1 installed with CUDA 11.8 support on a GPU with CUDA Version 12.0

I do have a custom CUDA kernel installed - is there a way to specify which CUDA architecture to target when compiling? When I run the command, it is able to work in interactive mode, but sending a batch of them results in this error.

pytorch / pytorch

RuntimeError: CUDA error: no kernel image is available for execution on the device #81883

🐛 Describe the bug