Describe the issue
Installing context chat backend on Nexcloud AIO 8.1.0V CPU-only system gets stuck on 50%. Then it timeouts with the following error:
[app_api] Error: ExApp context_chat_backend initialization failed. Error: ExApp context_chat_backend initialization timed out (2400m)
from ? by -- at Apr 8, 2024, 12:59:42 AM
Logs for context chat backend
```
App config:
{
"debug": true,
"disable_aaa": false,
"httpx_verify_ssl": true,
"use_colors": true,
"uvicorn_workers": 1,
"disable_custom_model_download": false,
"model_download_uri": "https://download.nextcloud.com/server/apps/context_chat_backend",
"vectordb": [
"chroma",
{
"is_persistent": true
}
],
"embedding": [
"instructor",
{
"model_name": "hkunlp/instructor-base",
"model_kwargs": {
"device": "cuda"
}
}
],
"llm": [
"llama",
{
"model_path": "dolphin-2.2.1-mistral-7b.Q5_K_M.gguf",
"n_batch": 10,
"n_ctx": 4096,
"n_gpu_layers": -1,
"template": "<|im_start|> system \nYou're an AI assistant good at finding relevant context from documents to answer questions provided by the user. <|im_end|>\n<|im_start|> user\nUse the following documents as context to answer the question at the end. REMEMBER to excersice source critisicm as the documents are returned by a search provider that can return unrelated documents.\n\nSTART OF CONTEXT: \n{context} \n\nEND OF CONTEXT!\n\nIf you don't know the answer or are unsure, just say that you don't know, don't try to make up an answer. Don't mention the context in your answer but rather just answer the question directly. \nQuestion: {question} Let's think this step-by-step. \n<|im_end|>\n<|im_start|> assistant\n",
"end_separator": "<|im_end|>",
"model_kwargs": {
"device": "cuda"
}
}
]
}
App disabled at startup
INFO: Started server process [1]
INFO: Waiting for application startup.
TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}}
TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit)
TRACE: 172.22.0.10:57670 - HTTP connection made
TRACE: 172.22.0.10:57670 - ASGI [2] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('172.22.0.13', 23000), 'client': ('172.22.0.10', 57670), 'scheme': 'http', 'method': 'GET', 'root_path': '', 'path': '/heartbeat', 'raw_path': b'/heartbeat', 'query_string': b'', 'headers': '<...>', 'state': {}}
TRACE: 172.22.0.10:57670 - ASGI [2] Send {'type': 'http.response.start', 'status': 200, 'headers': '<...>'}
heartbeat_handler: result=ok
INFO: 172.22.0.10:57670 - "GET /heartbeat HTTP/1.1" 200 OK
TRACE: 172.22.0.10:57670 - ASGI [2] Send {'type': 'http.response.body', 'body': '<15 bytes>'}
TRACE: 172.22.0.10:57670 - ASGI [2] Completed
TRACE: 172.22.0.10:57670 - ASGI [3] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('172.22.0.13', 23000), 'client': ('172.22.0.10', 57670), 'scheme': 'http', 'method': 'POST', 'root_path': '', 'path': '/init', 'raw_path': b'/init', 'query_string': b'', 'headers': '<...>', 'state': {}}
TRACE: 172.22.0.10:57670 - ASGI [3] Send {'type': 'http.response.start', 'status': 200, 'headers': '<...>'}
INFO: 172.22.0.10:57670 - "POST /init HTTP/1.1" 200 OK
TRACE: 172.22.0.10:57670 - ASGI [3] Send {'type': 'http.response.body', 'body': '<2 bytes>'}
TRACE: 172.22.0.10:57670 - HTTP connection lost
```
Docker inspect for ghcr.io/nextcloud/context_chat_backend:2.0.1
The config.yml of chat context backend
Again, its a CPU only system so I don't know why its using cuda
```
debug: true
disable_aaa: false
httpx_verify_ssl: true
use_colors: true
uvicorn_workers: 1
# model files download configuration
disable_custom_model_download: false
model_download_uri: https://download.nextcloud.com/server/apps/context_chat_backend
vectordb:
chroma:
is_persistent: true
# chroma_server_host:
# chroma_server_http_port:
# chroma_server_ssl_enabled:
# chroma_server_api_default_path:
weaviate:
# auth_client_secret:
# url: http://localhost:8080
embedding:
instructor:
model_name: hkunlp/instructor-base
model_kwargs:
device: cuda
llama:
model_path: dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
n_batch: 16
n_ctx: 2048
hugging_face:
# model_name: all-MiniLM-L6-v2
model_name: sentence-transformers/all-mpnet-base-v2
model_kwargs:
device: cuda
llm:
llama:
model_path: dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
n_batch: 10
n_ctx: 4096
n_gpu_layers: -1
template: "<|im_start|> system \nYou're an AI assistant good at finding relevant context from documents to answer questions provided by the user. <|im_end|>\n<|im_start|> user\nUse the following documents as context to answer the question at the end. REMEMBER to excersice source critisicm as the documents are returned by a search provider that can return unrelated documents.\n\nSTART OF CONTEXT: \n{context} \n\nEND OF CONTEXT!\n\nIf you don't know the answer or are unsure, just say that you don't know, don't try to make up an answer. Don't mention the context in your answer but rather just answer the question directly. \nQuestion: {question} Let's think this step-by-step. \n<|im_end|>\n<|im_start|> assistant\n"
end_separator: <|im_end|>
model_kwargs:
device: cuda
ctransformer:
model: dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
template: "<|im_start|> system \nYou're an AI assistant good at finding relevant context from documents to answer questions provided by the user. <|im_end|>\n<|im_start|> user\nUse the following documents as context to answer the question at the end. REMEMBER to excersice source critisicm as the documents are returned by a search provider that can return unrelated documents.\n\nSTART OF CONTEXT: \n{context} \n\nEND OF CONTEXT!\n\nIf you don't know the answer or are unsure, just say that you don't know, don't try to make up an answer. Don't mention the context in your answer but rather just answer the question directly. \nQuestion: {question} Let's think this step-by-step. \n<|im_end|>\n<|im_start|> assistant\n"
end_separator: <|im_end|>
config:
gpu_layers: -1
model_kwargs:
device: cuda
hugging_face:
model_id: gpt2
task: text-generation
pipeline_kwargs:
config:
max_length: 200
template: ""
```
Setup Details (please complete the following information):
Hello, yeah this is a known issue. A fix will be available very soon. I see you've already found the issue where it is tracked so will be closing this one to keep discussions at one place.
Describe the issue Installing context chat backend on Nexcloud AIO 8.1.0V CPU-only system gets stuck on 50%. Then it timeouts with the following error:
Logs for context chat backend
Docker inspect for ghcr.io/nextcloud/context_chat_backend:2.0.1
Docker inspect for nextcloud/all-in-one:latest
The config.yml of chat context backend Again, its a CPU only system so I don't know why its using cuda
Setup Details (please complete the following information):