Please add GPU support for this script. So, When ever I use this script with custom LLM's or Llama. It take too much time to generate because it didn't utilize GPU.

dibrale commented 1 year ago

This is perhaps not as fast as performing all model computation on GPU, but you can get a substantial boost to generation rate if you compile llama.cpp with CUBLAS enabled and use it with the existing script.

Crazy-Information commented 1 year ago

What are the steps to add this?

dibrale commented 1 year ago

I struggled to do this myself, and steps will vary from system to system, so your mileage may vary. That being said, you can try something like this:

1. Obtain the source code. Within your favorite repository directory:

git clone https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python/vendor
git clone https://github.com/ggerganov/llama.cpp.git

2. Set LLAMA_CUBLAS to ON in llama-cpp-python/CMakeLists.txt. 3. Run setup.py install or pip . install --upgrade inllama-cpp-python`

zba commented 10 months ago

add those files to repo and...

Dockerfile.llamacpp

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

WORKDIR /tmp
RUN --mount=type=cache,target=/var/cache/apt     apt-get update && apt-get install -y \
    python3 \
    python-is-python3 \
    python3-pip \
    python3-dev \
    python3-venv \
    build-essential \
    wget \
    unzip \
    git \
    ffmpeg \ 
    && rm -rf /var/lib/apt/lists/* 

RUN mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
WORKDIR /app

ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
ENV NVIDIA_VISIBLE_DEVICES=all
# Install depencencies
RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings chromadb

# Install llama-cpp-python (build with cuda)
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
COPY requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt

RUN --mount=type=cache,target=/root/.cache/pip pip install  -U chromadb 
ENV LLAMA_MODEL_PATH="/app/llama-model/phind-codellama-34b-v2.Q4_K_M.gguf"
ENV LLM_MODEL="llama"
ENTRYPOINT ["./babyagi.py"]
#WORKDIR /app/babycoder
#ENTRYPOINT ["python", "./babycoder.py"]

docker-compose.override.yml

services:
  babyagi-llama:
    build: 
      context: ./
      dockerfile: Dockerfile.llamacpp
    container_name: babyagi
    volumes:
      - "./:/app"
    stdin_open: true
    tty: true
    ulimits:
      memlock: -1
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0', '1']
              capabilities: [gpu]

run with docker compose up --build babyagi-llama

ah, also need to add ngpulayers in babyagi,py (may be just make it from env as others ? )

        print('Initialize model for evaluation')
        llm = Llama(
            model_path=LLAMA_MODEL_PATH,
            n_ctx=CTX_MAX,
            n_threads=LLAMA_THREADS_NUM,
            n_batch=512,
            use_mlock=False,
            n_gpu_layers=43
        )
        print('\nInitialize model for embedding')
        llm_embed = Llama(
            model_path=LLAMA_MODEL_PATH,
            n_ctx=CTX_MAX,
            n_threads=LLAMA_THREADS_NUM,
            n_batch=512,
            embedding=True,
            use_mlock=False,
            n_gpu_layers=43
        )

yoheinakajima / babyagi

Please add GPU support for this script. So, When ever I use this script with custom LLM's or Llama. It take too much time to generate because it didn't utilize GPU. #275