Open vargonis opened 7 months ago
You'll need a GPU with compute capability 8.0 or later. I have honestly only tried A100.
You can try llama.cpp on cuda (./server.sh --cuda ... in rllm-llamacpp).
We definitely need a better error message.
Didn't try with an A100, but the --cuda
option for the llamacpp server works, thanks!
I tried to run the Cuda server from within a container, but a thread panics:
This is running within a GCP VM with the following configuration:
Steps to reproduce:
cd .devcontainer
andsudo docker build . -f Dockerfile-cuda --tag aici
sudo docker run -it --rm -p 4242:4242 -v /path/to/aici/:/workspace/aici --gpus all aici /bin/bash
cd aici/rllm/rllm-cuda/
and./server.sh phi2 --host 0.0.0.0