Closed Dorozhko-Anton closed 4 months ago
I don't think the CUDA error is triggered by TaskWeaver because TaskWeaver only call the endpoint via its API. Could you share your configurations on the LLM for Taskweaver?
@liqul here is a config that I use
docker run --gpus=all -it -e LLM_API_BASE="http://<IP>:11434" -e LLM_API_KEY="ARBITRARY_STRING" -e LLM_API_TYPE="ollama" -e LLM_MODEL="phi3:medium" -p 48000:8000 --entrypoint bash taskweavercontainers/taskweaver-all-in-one:0.2-ws
/app/entrypoint_chainlit.sh
# or define env vars directly in container
export LLM_API_BASE="http://<IP>:11434"
export LLM_API_KEY="ARBITRARY_STRING"
export LLM_API_TYPE="ollama"
export LLM_MODEL="llama3:8b"
/app/entrypoint_chainlit.sh
# or
# python -m taskweaver -p ./project/
ollama models are accessible on "http://\<IP>:11434" with a given name in the LLM_MODEL in and outside all-in-one container
I don't have a local env to reproduce this issue. So, it is hard for me to debug.
The only thing I can think of is the request payload of TaskWeaver, though it is hard to understand the correlation. You can see the error message is from the server side:
File "/app/taskweaver/llm/ollama.py", line 116, in _chat_completion
raise Exception(
Exception: Failed to get completion with error: an unknown error was encountered while running the model CUDA error: unspecified launch failure
current device: 0, in function ggml_cuda_op_mul_mat at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1606
cudaGetLastError()
The prompt of the planner can be find at project/workspace/sessions/<session_id>/planner_prompt_xxxx.json
.
@liqul It was an issue of Ollama on V100 GPU
I had to use Ollama 0.2.4 which has fixes for V100 GPUs
Thanks. Closing an issue.
Describe the bug When I run any query with ollama and all-in-one docker of taskweaver I get CUDA and ggml errors that I don't understand.
To Reproduce Steps to reproduce the behavior: