mudler / LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
https://localai.io
MIT License
21.83k stars 1.67k forks source link

Watchdog does not kill and idle sentencetransformer backend #2277

Open joseluisll opened 2 months ago

joseluisll commented 2 months ago

Environment, Container Image, Hardware LocalAI version: 2.14.0

Container Image: localai/localai:v2.14.0-cublas-cuda12-ffmpeg

nvidia CUDA 12, intel x86_64, Ubuntu 22.04 LTS.

Linux makinota08 6.5.0-28-generic 29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

16G RAM, Nvidia 1660Ti 6G, Acer Nitro Laptop

Describe the bug I have localAI deployed in a docker container. I load a sentence transformer embeddings model, and test it succesfully using curl. Then I wait for 5 minutes, and check in the log that the watchdog tries to kill the process. The log shows that the process is succesfully killed, it no longer detects the idle connection.

However, if I issue a nvidia-smi, I can see the python process stuck and loaded into the memory of the GPU.

+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:01:00.0 Off | N/A | | N/A 45C P8 3W / 80W | 2268MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 5574 G /usr/lib/xorg/Xorg 4MiB | | 0 N/A N/A 25660 C python 2260MiB | +---------------------------------------------------------------------------------------+

also, with ps -fe you can see the process still there:

root 25660 24649 0 11:33 pts/0 00:00:09 python /build/backend/python/sentencetransformers/sentencetransformers.py --addr 127.0.0.1:39807

To Reproduce Activate Watchdog using ENV variables.

WATCHDOG_BUSY | true WATCHDOG_BUSY_TIMEOUT | 30m WATCHDOG_IDLE | true WATCHDOG_IDLE_TIMEOUT | 5m

Test sentencetransformers model with the following yaml config:

name: e5-large backend: sentencetransformers embeddings: true parameters: model: intfloat/multilingual-e5-large context_size: 1024

Wait for 5 minutes.

Expected behavior

The sentencetransformers backend should have been killed by the watchdog after 5 minutes, so the GPU memory is cleared.

Logs test of embeddings model.txt

localai-bot commented 2 months ago

It seems like there might be an issue with the watchdog not effectively killing the sentence transformers backend process. To investigate further, we will need to check if any other part of the code or configuration is keeping the process alive or if there are any other threads associated with the process that are preventing the GPU memory from being cleared. We'll also review the logs and related code to pinpoint the root cause of the problem.

joseluisll commented 2 months ago

I have tested as well the watchdog with a chat model ('mistral-openorca') and it works fine, the process is cleared after 5 minutes idle.

xjm1285 commented 2 months ago

same issue when use vllm backend