neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
12.33k stars 1.72k forks source link

Building Docker for Cuda fails weirdly #783

Open blackcatstudiosdevelopment opened 1 month ago

blackcatstudiosdevelopment commented 1 month ago

Docker File

FROM nvidia/cuda:12.2.0-base-ubuntu22.04

COPY . /app

RUN apt-get update && \
    apt-get install -y --allow-unauthenticated --no-install-recommends \
    wget \
    git \
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -rf /var/lib/apt/lists/*

ENV HOME "/root"
ENV CONDA_DIR "${HOME}/miniconda"
ENV PATH="$CONDA_DIR/bin":$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
ENV PIP_DOWNLOAD_CACHE="$HOME/.pip/cache"
ENV TORTOISE_MODELS_DIR="$HOME/tortoise-tts/build/lib/tortoise/models"

RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda3.sh \
    && bash /tmp/miniconda3.sh -b -p "${CONDA_DIR}" -f -u \
    && "${CONDA_DIR}/bin/conda" init bash \
    && rm -f /tmp/miniconda3.sh \
    && echo ". '${CONDA_DIR}/etc/profile.d/conda.sh'" >> "${HOME}/.profile"

# --login option used to source bashrc (thus activating conda env) at every RUN statement
SHELL ["/bin/bash", "--login", "-c"]

RUN conda create --name tortoise python=3.9 numba inflect --yes \
    && conda activate tortoise \
    && conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia --yes \
    && conda install transformers=4.31.0 --yes \
    && cd /app \
    && python setup.py install

Build command: docker build . -t tts Run Command:

docker run --gpus all -e TORTOISE_MODELS_DIR=/models -v "J:\AI\voice\tortoise-tts\tortoise\models":/models \
-v "J:\AI\voice\tortoise-tts\tortoise\results":/results \
-v "%USERPROFILE%\.cache\huggingface":/root/.cache/huggingface \
-v "J:\AI\voice\tortoise-tts\tortoise\work":/work -it tts

(base) root@9454180e9c47:/# cd app (base) root@9454180e9c47:/app# conda activate tortoise (tortoise) root@9454180e9c47:/app# python -c "import torch; print(torch.cuda.is_available());torch.zeros(1).cuda()"

/root/miniconda/envs/tortoise/lib/python3.9/site-packages/torch/cuda/__init__.py:141: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at /opt/conda/conda-bld/pytorch_1711403380164/work/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/miniconda/envs/tortoise/lib/python3.9/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found

I'm not sure how to even troubleshoot this error.

(tortoise) root@9454180e9c47:/app# nvidia-smi

Tue May 28 20:50:37 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        On  |   00000000:2A:00.0  On |                  N/A |
|  0%   44C    P5             18W /  170W |    1644MiB /  12288MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        36      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        48      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
blackcatstudiosdevelopment commented 1 month ago

I thought #760 would have been the solution.