toverainc / willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
Apache License 2.0
373 stars 33 forks source link

reinstall issues with CUDA #130

Closed bert269 closed 11 months ago

bert269 commented 11 months ago

I had my WIS and WAS working, but change the API key on HA. I could not get to WAS to re-generate the willow images, and WIS was stopping and starting all by itself - so I give up. I decided to re-install everything on my WIS/WAS server.

Well, I finally got WIS up and running after following the instructions (is this about 4months ago and I did sleep in the mean time) - But it does not pickup the CUDA (1080ti) device. Instead it defaults to CPU with this message:


willow-inference-server-wis-1    | [2023-10-20 16:38:51 +0000] [60] [INFO] Starting gunicorn 20.1.0
willow-inference-server-wis-1    | [2023-10-20 16:38:51 +0000] [60] [DEBUG] Arbiter booted
willow-inference-server-wis-1    | [2023-10-20 16:38:51 +0000] [60] [INFO] Listening at: http://0.0.0.0:19000 (60)
willow-inference-server-wis-1    | [2023-10-20 16:38:51 +0000] [60] [INFO] Using worker: uvicorn.workers.UvicornWorker
willow-inference-server-wis-1    | [2023-10-20 16:38:51 +0000] [62] [INFO] Booting worker with pid: 62
willow-inference-server-wis-1    | [2023-10-20 16:38:51 +0000] [60] [DEBUG] 1 workers
willow-inference-server-wis-1    | [2023-10-20 16:38:58 +0000] [62] [INFO] Willow Inference Server is starting... Please wait.
**willow-inference-server-wis-1    | [2023-10-20 16:38:58 +0000] [62] [INFO] CUDA: Not found - using CPU with 4 cores**
willow-inference-server-wis-1    | [2023-10-20 16:38:58 +0000] [62] [INFO] Started server process [62]
willow-inference-server-wis-1    | [2023-10-20 16:38:58 +0000] [62] [INFO] Waiting for application startup.
willow-inference-server-wis-1    | [2023-10-20 16:38:58 +0000] [62] [INFO] CTRANSLATE: Supported compute types for device cpu are {'int8_float32', 'int8', 'float32'}- using configured int8
willow-inference-server-wis-1    | [2023-10-20 16:38:58 +0000] [62] [INFO] Loading Whisper models...
willow-inference-server-wis-1    | [2023-10-20 16:39:28 +0000] [62] [INFO] Loading TTS models...
**willow-inference-server-wis-1    | [2023-10-20 16:39:33 +0000] [62] [INFO] Skipping warm_models for CPU**
willow-inference-server-wis-1    | [2023-10-20 16:39:33 +0000] [62] [INFO] Willow Inference Server is ready for requests!
willow-inference-server-wis-1    | [2023-10-20 16:39:33 +0000] [62] [INFO] Application startup complete.

Can anyone please tell me the complete instuctions to install everything needed for the CUDA/NVidia drivers?
I followed these instructions: [https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts](url)
But doing the post-install steps, all failed as there is no "/usr/local/cuda" directory.

Please help - I am totally willing to redo the server from scratch again. I have taken down on the step/instructions I did so far, but the NVidia/Cuda part is a broken mistery. Everything that I can find on the web is either old, or does not work.

Does anyone have a list of instructions and the order that it needs to be installed, for this to work?

Thanks
bert269 commented 11 months ago

Here is the solution - This works: [https://gist.github.com/denguir/b21aa66ae7fb1089655dd9de8351a202]

I will redo the server from scratch later - but for now it picked up the NVidia Drivers.