Open Yona-W opened 1 year ago
I got the same thing today with an NVIDIA A100. Did you ever figure it out?
Ah, I totally forgot I had opened an issue.
For my situation, I figured out that Torch 2.0 has problems specifically with the 1080Ti, and modifying the Containerfile to use Torch < 2.0 solved my issues.
With an A100 though, I'm not sure what could be causing it. I doubt Torch would have issues with pretty much the most popular ML card. If you're using the same Containerfile, I guess make sure the the correct CUDA architecture is listed in the defines?
I'm also using podman (on Fedora 38) and running into this as well, I also have an issue filed @ https://github.com/oobabooga/text-generation-webui/issues/2002
Thanks for the link to RedTopper/Text-Generation-Webui-Podman, I hadn't been using that.
The version of GPTQ for LLaMa used in the text-generation-webui is a fork of this repo @ https://github.com/oobabooga/GPTQ-for-LLaMa but I came looking
In my case, I ran into the error here, with:
File "/app/repositories/GPTQ-for-LLaMa/quant.py", line 431, in forward
y = y.to(output_dtype)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(Full stack trace in linked issue)
My configuration is as follows:
TORCH_CUDA_ARCH_LIST="All"
when compiling).I get the following output when running the benchmark:
I get a similar output (obviously with a different stack trace) when trying to run inference on the model. Everything loads correctly, the error only happens when something is evaluated.