Using the GPU for whisper in noScribe

kaixxx / noScribe

Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)

GNU General Public License v3.0

391 stars 66 forks source link

Using the GPU for whisper in noScribe #6

Closed Telebohrer closed 2 months ago

Telebohrer commented 1 year ago

Dear Kai,

I am using whisper in this version: https://github.com/ProjectEGU/whisper-for-low-vram for transcription (because I have a 8GB NVIDIA), which runs much faster than with CPU.

I would like to try your noScribe because of the integration of speaker distinguishing. Is there any way to let it run on the GPU instead of the CPU? The whisper.ccp from ggerganov should also support that (https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas) but I am not sure how to include that or set that option in noScribe.

Otherwise great work!

kaixxx commented 1 year ago

Hi Telebohrer There is not much I can do since I don't own a graphics card that supports CUDA. But noScribe uses the compiled "main.exe" from whisper.cpp. If you compile this with cublas support enabled like it is described under the link you provided, you could simply exchange this file and it should work. You will find the main app folder of noScribe (which also contains "main.exe") in a somewhat unusual place: C:\Users\USERNAME\AppData\Roaming\noScribe

Please keep me updated, I am curious to know if this works!

GitNitneroc commented 1 year ago

Could I find this CUDAS main.exe somewhere, running on CPU when you have the available GPU seems like a waiste of time ? I just can't compile it myself since I'm on linux. BTW noScribe seems to be working fine through Proton (I guess Wine would be fine) !

Telebohrer commented 1 year ago

Hi, I tried to compile the main.exe with cuBLAS on. noScribe still worked but nothing changed. I don't know if it does not make a difference or if I compiled it wrongly... I also new to make, so it might probably be some problem with that. Would also be interested to get a main.exe (and probably also a whisper.dll?) from somebody who can report that it worked.

kaixxx commented 1 year ago

Nice try. As I said, I cannot help here. But keep us updated!

kaixxx commented 10 months ago

I have just releases a new version (0.4b) which might work with CUDA. You have to install some additional libraies like described here: https://github.com/guillaumekln/faster-whisper#gpu This would accelerate the transcription process. To use CUDA for the speaker identification also, you have to go into them advanced options (see Readme) and change "pyannote_xpu" to "cuda". I'm not sure if this works since I cannot test it myself. But I would be curious to know...

kaixxx commented 10 months ago

I had do disable GPU support for now (at least for transcription, it may still work for speaker detection). It was causing problems. I will come back to this issue later.

kaixxx commented 5 months ago

Hi @Telebohrer A new version with cuda is out. Could you help testing it? See https://github.com/kaixxx/noScribe/discussions/50

Kuiriel commented 2 months ago

Hi @Telebohrer A new version with cuda is out. Could you help testing it? See #50

Hullo! I'm testing your beautiful little program with transcribing interview answers for my wife to help her with her practice. It's a great time saver app, and the accuracy seems excellent.

Oh, and I'm using the latest CUDA version that you posted on the description page. It's working without a hitch so far on the 3 minute test video I tried. On a 4090 it's faster than the 3 minutes of the video, I'm pretty sure. Trying it out on a 19 minute video now, then the next test is a 2h audio clip. It looks like the longest time it takes is just loading pyannote and discrete_diarization (whatever those are!)

kaixxx commented 2 months ago

Closing this since CUDA has officially been integrated with verion 0.5 (and seems to work well).

Kuiriel commented 2 months ago

On further testing, while using the CUDA version you have provided in the description, it looks like it's using CPU while transcribing the video, with very little GPU utilization. Maximum GPU used over a 1h video recording was 4GB. There were infrequent occasional bursts in GPU utilization, which could just be Windows. It's not used consistently, at least.

It's running on a 13700K. I'm happy with how it's running, I just thought it had to be the GPU getting involved given how fast it had been compared to the listed times in the description.

I'm also happy to test noScribe out with some bigger language models that put the 24GB GPU through its paces. Any suggestions of what model to use? I've been running it on Precise and it looks like it does videos to text in only 25% to 50% of the duration of the video.

The larger the video, the less efficient it seems to be time wise. A 2h video took 1h to transcribe.

I have now tried WhisperX, which is a lot clunkier to get around without a neat GUI. The 2h video took perhaps 5 minutes, and it was consistently using around 12GB of GPU Memory. It's a lot slower now that I'm trying diarization.

kaixxx commented 2 months ago

If you are getting transciption times of 15% ti 50% realtime (including speaker detection), you are most likely using the GPU at least for some steps of the process. One thing to check is that in the file config.yml in C:\Users<USERNAME>\AppData\Local\noScribe\noScribe both "pyannote_xpu" and "whisper_xpu" are set to "cuda" (omit the quotes). Important: Close noScribe before making any changes to "config.yml".

"precise" is alread using the largest whisper-model from OpenAI.

This thread contains a lot of discussions on how to improve CUDA-performance: https://github.com/kaixxx/noScribe/discussions/50 Since I don't have a NVIDA-GPU, I can provide only limited help.