CUDA BLAS GPU support for docker image

jannikmi commented 9 months ago

When I run the docker container I see that the GPU is only being used for the embedding model (encoder), not the LLM.

I noticed that llama-cpp-python is not compiled properly (Notice: BLAS=0), as described in this issue: https://github.com/abetlen/llama-cpp-python/issues/509

AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |

I got it to work by setting additional environment variables in the install llama-cpp-python command as mentioned in this comment: https://github.com/abetlen/llama-cpp-python/issues/509#issuecomment-1739098588 Note: it is important to link to the correct cuda compilers (correct version!)


# build llama-cpp with CUDA support
# solution according to: https://github.com/abetlen/llama-cpp-python/issues/509#issuecomment-1739098588
# setting build related env vars
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
# Note: variable replacement won't show during docker build
ARG CUDA_LOC=/usr/local/cuda-11
ARG CUDA_LOC2=${CUDA_LOC}.8
RUN --mount=type=cache,target=/root/.cache CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=${CUDA_LOC2} -DCUDAToolkit_ROOT=${CUDA_LOC2} -DCUDAToolkit_INCLUDE_DIR=${CUDA_LOC}/include -DCUDAToolkit_LIBRARY_DIR=${CUDA_LOC2}/lib64 -DCMAKE_CUDA_COMPILER:PATH=/usr/local/cuda/bin/nvcc" FORCE_CMAKE=1 .venv/bin/pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

Mijago commented 9 months ago

Hi, can you share your entire Dockerfile, please?

jannikmi commented 9 months ago

Sure: Dockerfile.txt

Note: I changed a couple of things on top of just the compilation environment variables.

lukaboljevic commented 9 months ago

@jannikmi I also managed to get PrivateGPT running on the GPU in Docker, though it's changes the 'original' Dockerfile as little as possible.

Starting from the current base Dockerfile, I made changes according to this pull request (which will probably be merged in the future). For me, this solved the issue of PrivateGPT not working in Docker at all - after the changes, everything was running as expected on the CPU. The command I used for building is simply docker compose up --build.

To get it to work on the GPU, I created a new Dockerfile and docker compose YAML file. The new docker compose file adds the following lines to share the GPU with the container:

services:
  private-gpt:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

For the new Dockerfile, I used the nvidia/cuda image, because it's way easier to work with the drivers and toolkits already set up. For everyone reading, please note that I used version 12.2.2 of the CUDA toolkit, because CUDA version 12.2 uses NVIDIA driver version 535, which is what is installed on my host machine. CUDA version 12.3 (which, at the time of writing, is the latest version) uses driver version 545, and I did not want to run into possible driver mismatch issues. Apart from the driver, on the host machine, I have the NVIDIA container toolkit and CUDA toolkit installed.

Apart from installing Python 3.11, gcc and rebuilding llama-cpp-python, everything is pretty much the same as with the changes from the aforementioned pull request. The command I used for building is docker compose -f new-docker-compose.yaml up --build.

FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 as base

# For tzdata
ENV DEBIAN_FRONTEND="noninteractive" TZ="Europe/Ljubljana"

# Install Python 3.11 and set it as default
RUN apt-get update && \
    apt-get install -y software-properties-common && \
    add-apt-repository ppa:deadsnakes/ppa && \
    apt-get update && \ 
    apt-get install -y python3.11 python3.11-venv python3-pip && \
    ln -sf /usr/bin/python3.11 /usr/bin/python3 && \
    python3 --version

# Install poetry
RUN pip install pipx
RUN python3 -m pipx ensurepath
RUN pipx install poetry
ENV PATH="/root/.local/bin:$PATH"

# Set the environment variable for the file URL (can be overwritten)
ENV FILE_URL="https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf"
# Set the predefined model name (can be ovewritten)
ENV NAME="mistral-7b-instruct-v0.1.Q4_K_M.gguf"

# Dependencies to build llama-cpp
RUN apt update && apt install -y \
  libopenblas-dev\
  ninja-build\
  build-essential\
  pkg-config\
  wget\
  gcc

# https://python-poetry.org/docs/configuration/#virtualenvsin-project
ENV POETRY_VIRTUALENVS_IN_PROJECT=true

############################################
FROM base as dependencies
############################################

WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./

RUN poetry install --with local
RUN poetry install --with ui
RUN poetry install --extras chroma

# Enable GPU support
RUN CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

############################################
FROM base as app
############################################

ENV PYTHONUNBUFFERED=1
ENV PORT=8080
EXPOSE 8080

# Prepare a non-root user
RUN adduser worker
WORKDIR /home/worker/app

RUN mkdir -p local_data; chown -R worker local_data
RUN mkdir -p models; chown -R worker models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker fern/ fern
COPY --chown=worker *.yaml *.md ./

# Copy the entry point script into the container and make it executable
COPY --chown=worker entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

# Set the entry point script to be executed when the container starts
ENTRYPOINT ["/entrypoint.sh", ".venv/bin/python", "-m", "private_gpt"]

he-man86 commented 9 months ago

hi @lukaboljevic, thanks for this. I have been struggling with the dockersettup for some time! It fails on the entrypoint file though .. could you provide us with that?

lukaboljevic commented 9 months ago

hi @lukaboljevic, thanks for this. I have been struggling with the dockersettup for some time. could you provide us with the entrypoint.sh file?

Glad to hear it helped. The entrypoint.sh file is given in the pull request I linked above (#1428)

jon6fingrs commented 8 months ago

I am pulling my hair out. I came across this thread after I had made my own Dockerfile. PrivateGPT will start, but I cannot, for the life of me, after many many hours, cannot get the GPU recognized in docker.

I have this installed on a Razer notebook with a gtx 1060. Running privategpt on bare metal works fine with GPU acceleration. Basically, repeating the same steps in my dockerfile, however, provides me with a working privategpt, but no GPU acceleration, Nvidia-smi does work inside the container.

I have tried this on my own computer and on runpod with the same results. I was not able to even build the other dockerfiles that were here and in the repo already.

Here is mine. Any additional help would be greatly appreciated. Thanks!

FROM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y software-properties-common git
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt update && apt upgrade -y && apt install -y python3.11 python3.11-venv python3-pip && ln -sf /usr/bin/python3.11 /usr/bin/python3

RUN pip install pipx
RUN python -m pipx ensurepath
RUN pipx install poetry

ENV PATH="/root/.local/bin:$PATH"
RUN apt update && apt install -y libopenblas-dev ninja-build build-essential pkg-config wget gcc

ENV FILE_URL="https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf"

ENV NAME="mistral-7b-instruct-v0.1.Q4_K_M.gguf"

RUN mkdir /app
WORKDIR /app
RUN git clone https://github.com/imartinez/privateGPT
WORKDIR /app/privateGPT

RUN poetry install --with local
RUN poetry install --with ui
RUN poetry install --extras chroma

RUN poetry run python scripts/setup

ENV PGPT_PROFILES=local
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
ARG CUDA_LOC=/usr/local/cuda-11
ARG CUDA_LOC2=${CUDA_LOC}.8
RUN --mount=type=cache,target=/root/.cache CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=${CUDA_LOC2} -DCUDAToolkit_ROOT=${CUDA_LOC2} -DCUDAToolkit_INCLUDE_DIR=${CUDA_LOC}/include -DCUDAToolkit_LIBRARY_DIR=${CUDA_LOC2}/lib64 -DCMAKE_CUDA_COMPILER:PATH=/usr/local/cuda/bin/nvcc" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

RUN sed -i "s/database: qdrant/database: chroma/g" settings.yaml
CMD make run

jon6fingrs commented 8 months ago

Ok I got it working. I changed the run command to just a wait timer and then went into the terminal in the container and manually executed 'PGPT_PROFILES=local make run' and it recognized the GPU. One of my environment variables is PGPT_PROFILES though so not sure why that helped?

Apotrox commented 7 months ago

hey @lukaboljevic, concerning the new docker compose, is your snippet all thats in there or did you add the content of the previous file too?

lukaboljevic commented 7 months ago

hey @lukaboljevic, concerning the new docker compose, is your snippet all thats in there or did you add the content of the previous file too?

I added the content of the previous file too, i.e. what I wrote in my comment is what I added to the original docker compose for it to work. Sorry for the late reply

zylon-ai / private-gpt

CUDA BLAS GPU support for docker image #1405