Open superchargez opened 7 months ago
This requires upgrading version of llama.cpp used. I should get to this sometimes this week or next.
Is not it possible to spun up a llamacpp server and reference it in aici.sh? Will this work?
I wonder if there any updates on this or on the rllm-cuda
backend that would enable to run AICI with newer models (e.g., Phi-3.5, Llama-3.2)?
I already downloaded phi3 instruct gguf from: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
and placed it at: ~/models/ jawad@desktoper:~/models$ ls Phi-3-mini-4k-instruct-q4.gguf
Yet, when I give command to choose model with -m or --model I get error (it returns the output as for help
./server.sh --help
) here is complete output: jawad@desktoper:~/gits/aici/rllm/rllm-llamacpp$ ./server.sh -m /home/jawad/models/Phi-3-mini-4k-instruct-q4.gguf usage: server.sh [--loop] [--cuda] [--debug] [model_name] [rllm_args...]model_name can a HuggingFace URL pointing to a .gguf file, or one of the following:
phi2 https://huggingface.co/TheBloke/phi-2-GGUF/blob/main/phi-2.Q8_0.gguf orca https://huggingface.co/TheBloke/Orca-2-13B-GGUF/blob/main/orca-2-13b.Q8_0.gguf mistral https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf mixtral https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/blob/main/mixtral-8x7b-instruct-v0.1.Q6_K.gguf code70 https://huggingface.co/TheBloke/CodeLlama-70B-Instruct-GGUF/blob/main/codellama-70b-instruct.Q5_K_M.gguf
Additionally, "server.sh build" will just build the server, and not run a model.
--cuda try to build llama.cpp against installed CUDA --loop restart server when it crashes and store logs in ./logs --debug don't build in --release mode
Try server.sh phi2 --help to see available rllm_args
Though if I choose phi2 instead of downloaded model it works fine. Does aici not support phi3 or is this a bug, and how to fix it? (I could adding a line for phi3 after this: 63 phi2 ) 64 ARGS="-m https://huggingface.co/TheBloke/phi-2-GGUF/blob/main/phi-2.Q8_0.gguf -t phi -w $EXPECTED/phi-2/cats.safetensors -s test_max tol=0.8 -s test_avgtol=0.3" 65 ;;
in server.sh (rllm-cuda/server.sh) solve the problem if I just replace the URL? But I don't want to download the model again, so, how can I use the local model which is not in the list of models i.e. phi2, mistral, mixtral etc?