~/tabbyAPI# docker run --gpus all 3fd2b9e9cf8fd03dc3601a7e1712611006e507a852e160933cda81f0f37a0036 (docker image:ghcr.io/theroyallab/tabbyapi:latest)
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
File "/app/main.py", line 11, in <module>
from common import gen_logging, sampling, model
File "/app/common/model.py", line 19, in <module>
from backends.exllamav2.model import ExllamaV2Container
File "/app/backends/exllamav2/model.py", line 12, in <module>
from exllamav2 import (
File "/usr/local/lib/python3.10/dist-packages/exllamav2/__init__.py", line 3, in <module>
from exllamav2.model import ExLlamaV2
File "/usr/local/lib/python3.10/dist-packages/exllamav2/model.py", line 41, in <module>
from exllamav2.attn import ExLlamaV2Attention, has_flash_attn, has_xformers
File "/usr/local/lib/python3.10/dist-packages/exllamav2/attn.py", line 38, in <module>
is_ampere_or_newer_gpu = any(torch.cuda.get_device_properties(i).major >= 8 for i in range(torch.cuda.device_count()))
File "/usr/local/lib/python3.10/dist-packages/exllamav2/attn.py", line 38, in <genexpr>
is_ampere_or_newer_gpu = any(torch.cuda.get_device_properties(i).major >= 8 for i in range(torch.cuda.device_count()))
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 465, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 314, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found
when I input
nvcc -V /bin/sh: 1: nvcc: not found
This doesn't seem to be my fault, because
docker run -it --gpus all 0dd75116a8ce8e6dd3cf6db1eb249d14f07f4115a1d35aaeb29bedfe8bc383f0/bin/bash(docker image:pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel)
==========
== CUDA ==
==========
CUDA Version 12.1.1
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
root@58d50a5fd49e:/workspace# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Nvidia container runtime can start normally in another Docker image
Reproduction steps
git clone https://github.com/theroyallab/tabbyAPI
docker compose -f docker/docker-compose.yml up
Expected behavior
docker run on gpu model successful
Logs
No response
Additional context
No response
Acknowledgements
[X] I have looked for similar issues before submitting this one.
[X] I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
[X] I understand that the developers have lives and my issue will be answered when possible.
[X] I understand the developers of this program are human, and I will ask my questions politely.
OS
Windows
GPU Library
CUDA 12.x
Python version
3.10
Describe the bug
when I input
nvcc -V /bin/sh: 1: nvcc: not found
This doesn't seem to be my fault, becauseNvidia container runtime can start normally in another Docker image
Reproduction steps
Expected behavior
docker run on gpu model successful
Logs
No response
Additional context
No response
Acknowledgements