Closed ericmail84 closed 3 months ago
I guess I got a similar problem. On a server without cuda the backend-container (installed from appstore) is restarting all the time. Is there any chance to release a cpu-only variant?
That's fine if it is presently cuda only (cpu-only would be nice though); however, unless I missed it, I did not see that indicated anywhere. My assumption is that the installation would detect hardware and modify accordingly.
Hi,
It was doing this before I attempted a fix. Went into the _data folder of the nc_app_context_chat_backend and replaced references to CUDA with CPU. This server does not have a cuda gpu to utilize.
Yes, this is one of the workarounds. You may copy the config.cpu.yaml file to the config.yaml
file inside the container (or volume) at /nc_app_context_chat_backend_data
using
docker cp config.cpu.yaml <container_id>:/nc_app_context_chat_backend_data/config.yaml
Other, manual way would be to build the cpu image Dockerfile.cpu
and register is manually.
I would suggest to wait a bit since the support is just around the corner. You may try the latest image if you wish: ghcr.io/kyteinsky/context_chat_backend:latest
. This is supposed to support cuda, rocm and cpu but is untested as of now.
I think I will wait until fixes arrive. I did attempt with the different config file and then also later with the latest image.
The issue seems to be in hardware detection, as I get the following
So it is still detecting CUDA. I did attempt to recreate the container removing the references to CUDA and nvidia, but to no avail.
As noted, though I am going to be patient and wait. I just wanted to post this to the extent that the information re hardware detection might be useful.
Thanks for trying it out! I appreciate your patience but this will help us find bugs quicker.
Config file already exists in the persistent storage ("/nc_app_context_chat_backend_data/config.yaml").
Hmm, the config file needs to be nuked, i.e. the config from the volume or the whole volume should be deleted as well. A cleanup/repair step might be required for this since users won't know when to clean install. I'll add that before release.
Detected hardware: cuda
The detection is just this check: lspci | grep -q "VGA.*NVIDIA"
. What does this output on your host machine?
Thank you, I will start fresh and report back.
As to the command, I got nothing when I ran it as indicated; however, running lspci | grep "VGA.*NVIDIA"
returned 02:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1). The system undoubtedly has an old nvidia card, which I may remove since it is not presently serving any purpose. I believe it only supports CUDA 3.5. Therein may lie the problem.
I believe it only supports CUDA 3.5. Therein may lie the problem.
Indeed. Removing it should select cpu.
I will do so and report back. Sorry if that is the case. I was previously under the impression that the device lacked cuda support, but obviously that is untrue.
No need to apologise. Any NVIDIA device present would lead the detection script to assume it is fully setup (with drivers supporting CUDA 11.8 installed) and will try to use it. This covers most of the setups. Unfortunately there is no simple way to detect a working NVIDIA/CUDA setup. There is one pytorch way but we're installing a suitable variant of pytorch in the script so that's a no go. I have attached the script in case you wish to have a look (change the extension to .sh). hwdetect.md
Having same issue, would appreciate a fix
closing this since the fix was released in v2.1.0
Describe the bug
To Reproduce Steps to reproduce the behavior:
Expected behavior Upon installation of dependencies and backend, should function as expected.
Server logs (if applicable)
Context Chat Backend logs
Setup Details:
Context Chat Backend deployment method: Simple
Additional context It was doing this before I attempted a fix. Went into the _data folder of the nc_app_context_chat_backend and replaced references to CUDA with CPU. This server does not have a cuda gpu to utilize.