Open Crazy-Information opened 1 year ago
What are the steps to add this?
I struggled to do this myself, and steps will vary from system to system, so your mileage may vary. That being said, you can try something like this:
1. Obtain the source code. Within your favorite repository directory:
git clone https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python/vendor
git clone https://github.com/ggerganov/llama.cpp.git
2. Set LLAMA_CUBLAS
to ON
in llama-cpp-python/CMakeLists.txt.
3. Run setup.py install
or pip . install --upgrade in
llama-cpp-python`
add those files to repo and...
Dockerfile.llamacpp
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
WORKDIR /tmp
RUN --mount=type=cache,target=/var/cache/apt apt-get update && apt-get install -y \
python3 \
python-is-python3 \
python3-pip \
python3-dev \
python3-venv \
build-essential \
wget \
unzip \
git \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
WORKDIR /app
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
ENV NVIDIA_VISIBLE_DEVICES=all
# Install depencencies
RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings chromadb
# Install llama-cpp-python (build with cuda)
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
COPY requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -U chromadb
ENV LLAMA_MODEL_PATH="/app/llama-model/phind-codellama-34b-v2.Q4_K_M.gguf"
ENV LLM_MODEL="llama"
ENTRYPOINT ["./babyagi.py"]
#WORKDIR /app/babycoder
#ENTRYPOINT ["python", "./babycoder.py"]
docker-compose.override.yml
services:
babyagi-llama:
build:
context: ./
dockerfile: Dockerfile.llamacpp
container_name: babyagi
volumes:
- "./:/app"
stdin_open: true
tty: true
ulimits:
memlock: -1
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1']
capabilities: [gpu]
run with docker compose up --build babyagi-llama
ah, also need to add ngpulayers in babyagi,py (may be just make it from env as others ? )
print('Initialize model for evaluation')
llm = Llama(
model_path=LLAMA_MODEL_PATH,
n_ctx=CTX_MAX,
n_threads=LLAMA_THREADS_NUM,
n_batch=512,
use_mlock=False,
n_gpu_layers=43
)
print('\nInitialize model for embedding')
llm_embed = Llama(
model_path=LLAMA_MODEL_PATH,
n_ctx=CTX_MAX,
n_threads=LLAMA_THREADS_NUM,
n_batch=512,
embedding=True,
use_mlock=False,
n_gpu_layers=43
)
This is perhaps not as fast as performing all model computation on GPU, but you can get a substantial boost to generation rate if you compile llama.cpp with CUBLAS enabled and use it with the existing script.