Watchdog does not kill and idle sentencetransformer backend

joseluisll commented 2 months ago

Environment, Container Image, Hardware LocalAI version: 2.14.0

Container Image: localai/localai:v2.14.0-cublas-cuda12-ffmpeg

nvidia CUDA 12, intel x86_64, Ubuntu 22.04 LTS.

Linux makinota08 6.5.0-28-generic 29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

16G RAM, Nvidia 1660Ti 6G, Acer Nitro Laptop

Describe the bug I have localAI deployed in a docker container. I load a sentence transformer embeddings model, and test it succesfully using curl. Then I wait for 5 minutes, and check in the log that the watchdog tries to kill the process. The log shows that the process is succesfully killed, it no longer detects the idle connection.

However, if I issue a nvidia-smi, I can see the python process stuck and loaded into the memory of the GPU.

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 5574 G /usr/lib/xorg/Xorg 4MiB | | 0 N/A N/A 25660 C python 2260MiB | +---------------------------------------------------------------------------------------+

also, with ps -fe you can see the process still there:

root 25660 24649 0 11:33 pts/0 00:00:09 python /build/backend/python/sentencetransformers/sentencetransformers.py --addr 127.0.0.1:39807

To Reproduce Activate Watchdog using ENV variables.

WATCHDOG_BUSY | true WATCHDOG_BUSY_TIMEOUT | 30m WATCHDOG_IDLE | true WATCHDOG_IDLE_TIMEOUT | 5m

Test sentencetransformers model with the following yaml config:

name: e5-large backend: sentencetransformers embeddings: true parameters: model: intfloat/multilingual-e5-large context_size: 1024

Wait for 5 minutes.

Expected behavior

The sentencetransformers backend should have been killed by the watchdog after 5 minutes, so the GPU memory is cleared.

Logs test of embeddings model.txt

localai-bot commented 2 months ago

It seems like there might be an issue with the watchdog not effectively killing the sentence transformers backend process. To investigate further, we will need to check if any other part of the code or configuration is keeping the process alive or if there are any other threads associated with the process that are preventing the GPU memory from being cleared. We'll also review the logs and related code to pinpoint the root cause of the problem.

joseluisll commented 2 months ago

I have tested as well the watchdog with a chat model ('mistral-openorca') and it works fine, the process is cleared after 5 minutes idle.

xjm1285 commented 2 months ago

same issue when use vllm backend

mudler / LocalAI

Watchdog does not kill and idle sentencetransformer backend #2277