[BUG] faster-whisper:gpu-version-1.0.1 runs out of memory after ~ 1h

jerome83136 commented 7 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

Hello all,

I'm running faster-whisper this way: docker run -d --gpus all --runtime=nvidia --name=faster-whisper --privileged=true -e WHISPER_BEAM=10 -e WHISPER_LANG=fr -e WHISPER_MODEL=medium-int8 -e NVIDIA_DRIVER_CAPABILITIES=all -e NVIDIA_VISIBLE_DEVICES=all -p 10300:10300/tcp -v /mnt/docker/data/faster-whisper/:/config:rw ghcr.io/linuxserver/lspipepr-faster-whisper:gpu-version-1.0.1

Hardware:

OS: GNU Linux Debian 12.5 (kernel: 6.6.13+bpo-amd64)
GPU: nVidia TU106M [GeForce RTX 2060 Mobile] (6GB memory)
CPU: AMD Ryzen 7 4800H with Radeon Graphics (8 cores / 1,4GHz)
Host memory: 32GB
Storage: SSD NVME 1TB

It works fine and the container PID is affected to the GPU (nvidia-smi)

But after ~1h of inactivity I get the following Out Of Memory error: https://pastebin.com/raw/c3s4wYAm If I check nvidia-smi; I still see the container PID using ~1,3GB of memory (so: quite less than the 6GB available on the GPU)

Is there someone that could be so kind to point me to some fix ? When searching, I spotted this config: max_split_size_mb but I don't know if it could help and I reallydon't know how to apply it Am I using a too big model for my GPU ? Or should I reduce beams number ?

Thank you very much for your help Best regards

Expected Behavior

faster-whisper stays available on long term; without OOM error

NB: when problem occurs; nvidia-smi shows the container PID using 1,3GB of GPU memory (so there is still ~4.7GB available memory)

Steps To Reproduce

in this environment & 2. with this config: OS: GNU Linux Debian 12.5 (kernel: 6.6.13+bpo-amd64) GPU: nVidia TU106M [GeForce RTX 2060 Mobile] (6GB memory) CPU: AMD Ryzen 7 4800H with Radeon Graphics (8 cores / 1,4GHz) Host memory: 32GB Storage: SSD NVME 1TB Docker version: 20.10.24+dfsg1
RUN docker run -d --gpus all --runtime=nvidia --name=faster-whisper --privileged=true -e WHISPER_BEAM=10 -e WHISPER_LANG=fr -e WHISPER_MODEL=medium-int8 -e NVIDIA_DRIVER_CAPABILITIES=all -e NVIDIA_VISIBLE_DEVICES=all -p 10300:10300/tcp -v /mnt/docker/data/faster-whisper/:/config:rw ghcr.io/linuxserver/lspipepr-faster-whisper:gpu-version-1.0.1
Wait ~1h and notice "exception=RuntimeError('CUDA failed with error out of memory')>" in container's logs

Environment

- OS:GNU Linux Debian 12.5 (kernel: 6.6.13+bpo-amd64)
- How docker service was installed:apt update + apt install docker.io + nvidia-ctk runtime configure --runtime=docker

CPU architecture

x86-64

Docker creation

docker run -d --gpus all --runtime=nvidia --name=faster-whisper --privileged=true -e WHISPER_BEAM=10 -e WHISPER_LANG=fr -e WHISPER_MODEL=medium-int8 -e NVIDIA_DRIVER_CAPABILITIES=all -e NVIDIA_VISIBLE_DEVICES=all -p 10300:10300/tcp -v /mnt/docker/data/faster-whisper/:/config:rw ghcr.io/linuxserver/lspipepr-faster-whisper:gpu-version-1.0.1

Container logs

[custom-init] No custom files found, skipping...
INFO:__main__:Ready
[ls.io-init] done.
INFO:wyoming_faster_whisper.handler: Allume la cuisine.
INFO:wyoming_faster_whisper.handler: Éteins la cuisine !
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-14' coro=<AsyncEventHandler.run() done, defined at /lsiopy/lib/python3.10/site-packages/wyoming/server.py:28> exception=RuntimeError('CUDA failed with error out of memory')>
Traceback (most recent call last):
  File "/lsiopy/lib/python3.10/site-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/lsiopy/lib/python3.10/site-packages/wyoming_faster_whisper/handler.py", line 75, in handle_event
    text = " ".join(segment.text for segment in segments)
  File "/lsiopy/lib/python3.10/site-packages/wyoming_faster_whisper/handler.py", line 75, in <genexpr>
    text = " ".join(segment.text for segment in segments)
  File "/lsiopy/lib/python3.10/site-packages/wyoming_faster_whisper/faster_whisper/transcribe.py", line 162, in generate_segments
    for start, end, tokens in tokenized_segments:
  File "/lsiopy/lib/python3.10/site-packages/wyoming_faster_whisper/faster_whisper/transcribe.py", line 186, in generate_tokenized_segments
    result, temperature = self.generate_with_fallback(segment, prompt, options)
  File "/lsiopy/lib/python3.10/site-packages/wyoming_faster_whisper/faster_whisper/transcribe.py", line 279, in generate_with_fallback
    result = self.model.generate(
RuntimeError: CUDA failed with error out of memory

github-actions[bot] commented 7 months ago

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

aptalca commented 7 months ago

The following is the upstream repo I mentioned on Discord: https://github.com/SYSTRAN/faster-whisper

jerome83136 commented 7 months ago

The following is the upstream repo I mentioned on Discord: https://github.com/SYSTRAN/faster-whisper

Oh, sorry. I will then move my issue in this other repo. Thank you

linuxserver / docker-faster-whisper