jhj0517 / Whisper-WebUI

A Web UI for easy subtitle using whisper model.
Apache License 2.0
1.16k stars 170 forks source link

Should the module be unloaded from VRAM after its use? #325

Open martindellavecchia opened 1 week ago

martindellavecchia commented 1 week ago

Which OS are you using?

I've noticed that after running a transcription the model remains on VRAM making impossible to do another transcription with a different model as there's not enough vram. Is there any way to offload the model after certain period of incatvity?

Thanks.

jhj0517 commented 1 week ago

Hi. If you're able to run large models, you should be able to use the other Whisper models as you like in the web ui.

Because the expected behavior when changing the Whisper model is to update the current model to it, not load it additionally.

But if you tried to run Music removal model together while transcribing, you might get CUDA errors if you have <12GB VRAM.

martindellavecchia commented 1 week ago

VRAM wise I should be OK, I have 12GB (3060), other AI stuff is running on another GPU.

I noticed that other model managers such ollama offload the models after certain time of not being used, or even they unload them when the user want to select a different model.

I.e. If i try a transcription with large-v2 and I don't like the result and I want to try large-v3, I need to shutdown the webui to offload the large-v2 model, as it's always in memory.

jhj0517 commented 1 week ago

I.e. If i try a transcription with large-v2 and I don't like the result and I want to try large-v3, I need to shutdown the webui to offload the large-v2 model, as it's always in memory.

This is weird and not expected behavior. If you're able to run large-v2, you should be able to run large-v3 by simply changing the model.

If each different model runs fully on a different GPU, this should not happen. Probably something is wrong with the setup, but I don't have multiple GPUs so I can't reproduce/test about it.

martindellavecchia commented 1 week ago

Not exactly sure what is it, after the transcription finish, using large-v3, or any other model, there's a remaining processes in the gpu:

Wed Oct 9 11:30:47 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A | | 0% 40C P8 13W / 170W | 6394MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 79020 C python3.10 6384MiB |

This is the python3.10 using to run the webui

it's like it never offload the model competely from vram.