Open racerxdl opened 1 year ago
did you tried with:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "TheBloke/orca_mini_v2_13b-GPTQ",
"messages": [{"role": "user", "content": "### System:\nYou are an AI assistant that follows instruction extremely well. Help as much as you can.\n \n### User: \ntell me about AI \n### Response:"}],
"backend": "autogptq", "model_base_name": "orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order"
}'
?
ah just saw you tried, my bad - it looks like a downloading issue to me. For local files, only exllama now works with local folders - what error you get there? also, did you tried with another model?
ah just saw you tried, my bad - it looks like a downloading issue to me. For local files, only exllama now works with local folders - what error you get there? also, did you tried with another model?
For exllama the error seens like a incompatible cuda version on the container:
ImportError: /usr/local/lib/python3.9/dist-packages/exllama_ext.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
I also tried Vicuna and the TheBloke directly to download, they give the same not found errors. But for standard llama-cpp it downloads just fine (I tested the same models in GGML versions over llama-cpp and they work fine).
LocalAI version:
Docker Image:
quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg
Environment, CPU architecture, OS, and Version: Running in TrueNAS Scale Kubernetes (k3s) with a NVidia Tesla P40 in the container.
Describe the bug AutoGPTQ added by #871 doesn't work in upstream container. Also tried exllama and gives a linker error for CudaSetDevice.
To Reproduce
Also tried with a local model:
Expected behavior
Logs