[bug]: CPU-only system crashes with error: Cannot Open Shared Object File

ericmail84 commented 3 months ago

Describe the bug

To Reproduce Steps to reproduce the behavior:

Install AppAPI, Assistant, Nextcloud Context Chat
Install ExApp Context Chat Backend.
Wait for installation to complete. Attempt to use.
Check NC Logs: See curl cannot connect.
Restart nc_app_context_chat_backend.
See backend logs, loop at "cannot open shared object file"

Expected behavior Upon installation of dependencies and backend, should function as expected.

Server logs (if applicable)

``` [context_chat] Error: Error during request to ExApp (context_chat_backend): cURL error 7: Failed to connect to context_chat_backend port 23000 after 0 ms: Couldn't connect to server (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://context_chat_backend:23000/query from ? by -- at Mar 28, 2024, 10:45:12 AM ```

Context Chat Backend logs

``` App config: { "debug": true, "disable_aaa": false, "httpx_verify_ssl": true, "use_colors": true, "uvicorn_workers": 1, "disable_custom_model_download": false, "model_download_uri": "https://download.nextcloud.com/server/apps/context_chat_backend", "vectordb": [ "chroma", { "is_persistent": true } ], "embedding": [ "instructor", { "model_name": "hkunlp/instructor-base", "model_kwargs": { "device": "cpu" } } ], "llm": [ "llama", { "model_path": "dolphin-2.2.1-mistral-7b.Q5_K_M.gguf", "n_batch": 10, "n_ctx": 4096, "n_gpu_layers": -1, "template": "<|im_start|> system \nYou're an AI assistant good at finding relevant context from documents to answer questions provided by the user. <|im_end|>\n<|im_start|> user\nUse the following documents as context to answer the question at the end. REMEMBER to excersice source critisicm as the documents are returned by a search provider that can return unrelated documents.\n\nSTART OF CONTEXT: \n{context} \n\nEND OF CONTEXT!\n\nIf you don't know the answer or are unsure, just say that you don't know, don't try to make up an answer. Don't mention the context in your answer but rather just answer the question directly. \nQuestion: {question} Let's think this step-by-step. \n<|im_end|>\n<|im_start|> assistant\n", "end_separator": "<|im_end|>", "model_kwargs": { "device": "cpu" } } ] } Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/llama_cpp/llama_cpp.py", line 70, in _load_shared_library load INSTRUCTOR_Transformer max_seq_length 512 return ctypes.CDLL(str(_lib_path), **cdll_args) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/ctypes/__init__.py", line 376, in __init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: libcuda.so.1: cannot open shared object file: No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/main.py", line 6, in from context_chat_backend import app_config File "/app/context_chat_backend/__init__.py", line 53, in app.extra['ENABLED'] = model_init(app) ^^^^^^^^^^^^^^^ File "/app/context_chat_backend/download.py", line 269, in model_init _set_app_config(app, config) File "/app/context_chat_backend/download.py", line 100, in _set_app_config model = init_model('llm', (llm_name, llm_config)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/models/__init__.py", line 25, in init_model model = load_model(model_type, model_info) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/models/load_model.py", line 23, in load_model return get_model_for(model_type, model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/models/llama.py", line 21, in get_model_for return LlamaCpp(**{ **model_config, 'model_path': model_path }) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain/load/serializable.py", line 97, in __init__ super().__init__(**kwargs) File "/usr/local/lib/python3.11/dist-packages/pydantic/v1/main.py", line 339, in __init__ values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/pydantic/v1/main.py", line 1102, in validate_model values = validator(cls_, values) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/langchain/llms/llamacpp.py", line 143, in validate_environment from llama_cpp import Llama, LlamaGrammar File "/usr/local/lib/python3.11/dist-packages/llama_cpp/__init__.py", line 1, in from .llama_cpp import * File "/usr/local/lib/python3.11/dist-packages/llama_cpp/llama_cpp.py", line 83, in _lib = _load_shared_library(_lib_base_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/llama_cpp/llama_cpp.py", line 72, in _load_shared_library raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}") RuntimeError: Failed to load shared library '/usr/local/lib/python3.11/dist-packages/llama_cpp/libllama.so': libcuda.so.1: cannot open shared object file: No such file or directory ```

Setup Details:

Nextcloud Version: 28.0.3, NC AIO
AppAPI Version: 2.3.2
Context Chat PHP Version 2.0.2 (but same error with 2.0.1)
Context Chat Backend Version 2.0.1
Nextcloud deployment method: Linux, Docker, NC AIO
Context Chat Backend deployment method: Simple

Additional context It was doing this before I attempted a fix. Went into the _data folder of the nc_app_context_chat_backend and replaced references to CUDA with CPU. This server does not have a cuda gpu to utilize.

socialize-IT commented 3 months ago

I guess I got a similar problem. On a server without cuda the backend-container (installed from appstore) is restarting all the time. Is there any chance to release a cpu-only variant?

ericmail84 commented 3 months ago

That's fine if it is presently cuda only (cpu-only would be nice though); however, unless I missed it, I did not see that indicated anywhere. My assumption is that the installation would detect hardware and modify accordingly.

kyteinsky commented 3 months ago

Hi,

It was doing this before I attempted a fix. Went into the _data folder of the nc_app_context_chat_backend and replaced references to CUDA with CPU. This server does not have a cuda gpu to utilize.

Yes, this is one of the workarounds. You may copy the config.cpu.yaml file to the config.yaml file inside the container (or volume) at /nc_app_context_chat_backend_data using

docker cp config.cpu.yaml <container_id>:/nc_app_context_chat_backend_data/config.yaml

Other, manual way would be to build the cpu image Dockerfile.cpu and register is manually.

I would suggest to wait a bit since the support is just around the corner. You may try the latest image if you wish: ghcr.io/kyteinsky/context_chat_backend:latest. This is supposed to support cuda, rocm and cpu but is untested as of now.

ericmail84 commented 3 months ago

I think I will wait until fixes arrive. I did attempt with the different config file and then also later with the latest image.

The issue seems to be in hardware detection, as I get the following

Detecting hardware... Detected hardware: cuda Config file already exists in the persistent storage ("/nc_app_context_chat_backend_data/config.yaml"). App config: { "debug": true, "disable_aaa": false, "httpx_verify_ssl": true, "use_colors": true, "uvicorn_workers": 1, "disable_custom_model_download": false, "model_download_uri": "https://download.nextcloud.com/server/apps/context_chat_backend", "vectordb": [ "chroma", { "is_persistent": true } ], "embedding": [ "instructor", { "model_name": "hkunlp/instructor-base", "model_kwargs": { "device": "cpu" } } ], "llm": [ "llama", { "model_path": "dolphin-2.2.1-mistral-7b.Q5_K_M.gguf", "n_batch": 10, "n_ctx": 4096, "template": "<|im_start|> system \nYou're an AI assistant good at finding relevant context from documents to answer questions provided by the user. <|im_end|>\n<|im_start|> user\nUse the following documents as context to answer the question at the end. REMEMBER to excersice source critisicm as the documents are returned by a search provider that can return unrelated documents.\n\nSTART OF CONTEXT: \n{context} \n\nEND OF CONTEXT!\n\nIf you don't know the answer or are unsure, just say that you don't know, don't try to make up an answer. Don't mention the context in your answer but rather just answer the question directly. \nQuestion: {question} Let's think this step-by-step. \n<|im_end|>\n<|im_start|> assistant\n", "end_separator": "<|im_end|>" } ] } App disabled at startup INFO: Started server process [1] INFO: Waiting for application startup. TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}} TRACE: ASGI [1] Receive {'type': 'lifespan.startup'} TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'} INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit)

So it is still detecting CUDA. I did attempt to recreate the container removing the references to CUDA and nvidia, but to no avail.

As noted, though I am going to be patient and wait. I just wanted to post this to the extent that the information re hardware detection might be useful.

kyteinsky commented 3 months ago

Thanks for trying it out! I appreciate your patience but this will help us find bugs quicker.

Config file already exists in the persistent storage ("/nc_app_context_chat_backend_data/config.yaml").

Hmm, the config file needs to be nuked, i.e. the config from the volume or the whole volume should be deleted as well. A cleanup/repair step might be required for this since users won't know when to clean install. I'll add that before release.

Detected hardware: cuda

The detection is just this check: lspci | grep -q "VGA.*NVIDIA". What does this output on your host machine?

ericmail84 commented 3 months ago

Thank you, I will start fresh and report back.

As to the command, I got nothing when I ran it as indicated; however, running lspci | grep "VGA.*NVIDIA" returned 02:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1). The system undoubtedly has an old nvidia card, which I may remove since it is not presently serving any purpose. I believe it only supports CUDA 3.5. Therein may lie the problem.

kyteinsky commented 3 months ago

I believe it only supports CUDA 3.5. Therein may lie the problem.

Indeed. Removing it should select cpu.

ericmail84 commented 3 months ago

I will do so and report back. Sorry if that is the case. I was previously under the impression that the device lacked cuda support, but obviously that is untrue.

kyteinsky commented 3 months ago

No need to apologise. Any NVIDIA device present would lead the detection script to assume it is fully setup (with drivers supporting CUDA 11.8 installed) and will try to use it. This covers most of the setups. Unfortunately there is no simple way to detect a working NVIDIA/CUDA setup. There is one pytorch way but we're installing a suitable variant of pytorch in the script so that's a no go. I have attached the script in case you wish to have a look (change the extension to .sh). hwdetect.md

KyTDK commented 3 months ago

Having same issue, would appreciate a fix

kyteinsky commented 3 months ago

closing this since the fix was released in v2.1.0

nextcloud / context_chat_backend

[bug]: CPU-only system crashes with error: Cannot Open Shared Object File #30