linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

CUDA or is it me? Windows. #96

Closed pinballelectronica closed 11 months ago

pinballelectronica commented 1 year ago

I use CUDA all day long with a 4090- This seems to be an outlier.

File "C:\Users\dave\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda__init__.py", line 211, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

I have a very fast CPU(s) so the inference was good enough for me to not notice (sub 30m large-v2 for 1.5 hour video). Alas, my 4090 wants a part of the action. I cannot for the life of me figure this out.

Py 3.10.10 CUDA 12.1 in path added all the requirements.txt (had most)

using --device "cuda:0" (literally, with the quotes)

Thanks

pinballelectronica commented 1 year ago

I gave up on making this work on Windows with CUDA. It was like hours of struggle for no real benefit considering I can run a Docker container or WSL 2. I have no idea why I tried to hard with WIndows. Even in a conda environs it was a nightmare. One thing fixed, another thing broke.

So I ran it with WSL 2 and I got it working in like 2 minutes. A few notes:

both Windows and Ubuntu 20.04 (WSL 2) needed onnxruntime which was not in the reqs.

FWIW w/r/t CUDA, I'm running 12.1 which is passed through WSL via Docker Desktop. I am unable to run this on anything smaller than a 4090 (for my use case). I think the memory requirements are right around exactly 22GB, at least for the input I provided which was a 675Mb Wav file and the V2 large. The speed is dramatically faster by about 10x as opposed to running it on CPU (i9-12900k), 16 threads give me ~100FPS where where as below it was blazing fast on GPU:

wts-vad

phineas-pta commented 1 year ago

downgrade to cuda 11.8

pytorch still bad with cuda 12