请问本地部署好cpu环境小模型后，怎么支持restful的API调用？

gotoeasy commented 10 months ago

先感谢作者，让人能快速体验

# 自行下载Chinese-Llama-2-7b-ggml-q4.bin放到`pwd`/soulteary，然后这就跑起来了
docker run --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v `pwd`/soulteary:/app/soulteary soulteary/llama2:runtime bash

# 这就可以开始聊起来了
./main -m /app/soulteary/Chinese-Llama-2-7b-ggml-q4.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

个人感觉，小麻雀实际会有更多的应用落地机会，很多应用场景已经足够应付了接下来，首当其冲就是通过API调用，得有restful的API，这样才能方便和其他系统应用对接

咋弄呢，还请不吝赐教

gotoeasy commented 10 months ago

https://github.com/abetlen/llama-cpp-python 这个可以，不过用上python后慢了3拍，试下来性能下降3倍的样子

gotoeasy commented 10 months ago

晃了一圈，llama.cpp 本自具足编译麻烦的找docker就行了

soulteary / docker-llama2-chat

请问本地部署好cpu环境小模型后，怎么支持restful的API调用？ #21