Buildup in Memory overtime

arnavmehta7 commented 12 months ago

🐛 Describe the bug

I firstly deployed the "same" model with the same requirements on another torchserve alternative on k8s. It works perfectly and the graph is fully stable. However it didn't provide batching and workers support like torchserve, so I have moved to torchserve.

However, there is a small "buildup" over time in the same code, same requirements. There is a Peak, then a "small" valley on each inference. But the valley is smaller than the Peak, and hence the use of memory is increasing slowly.

I set the number of netty_threads and set the vmargs, and then the leakage reduced, but still overtime. It is building up.

My model and the code supporting had 0 leakage, but the issue seems to be from java or torchserve itself.

NOTE: I pass big dictionaries between preprocess -> inference -> postprocess

Error logs

Memory Example:

Installation instructions

Using the torchserve docker gpu image from dockerhub

Model Packaing

I just am using a custom model, which I cannot send and running it with a custom handler. Nothing too fancy. I used to examples to make it.

config.properties

inference_address=http://0.0.0.0:8888
metrics_address=http://0.0.0.0:8889
async_logging=true
number_of_netty_threads=4
enable_envvars_config=true
prefer_direct_buffer=true
vmargs=-Xmx128m -XX:-UseLargePages -XX:+UseG1GC -XX:MaxMetaspaceSize=32M -XX:MaxDirectMemorySize=50m -XX:+ExitOnOutOfMemoryError
enable_metrics_api=false

models={\
  "<retracted>": {\
    "1.0": {\
        "defaultVersion": true,\
        "marName": "<retracted>.mar",\
        "minWorkers": 1,\
        "maxWorkers": 1,\
        "batchSize": 1,\
        "maxBatchDelay": 1,\
        "responseTimeout": 2700000\
    }\
  }\
}
# response timeout is 2700000 i.e., 45 minutes

Versions

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch:

torchserve==0.8.1
torch-model-archiver==0.8.1

Python version: 3.10 (64-bit runtime)
Python executable: /home/venv/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==1.24.4
nvgpu==0.10.0
psutil==5.9.5
pytorch-lightning==1.6.5
pytorch-metric-learning==1.7.3
requests==2.31.0
requests-oauthlib==1.3.1
sentencepiece==0.1.99
torch==2.0.1+cu118
torch-audiomentations==0.11.0
torch-model-archiver==0.8.1
torch-pitch-shift==1.2.4
torch-workflow-archiver==0.2.9
torchaudio==2.0.2+cu118
torchdata==0.6.1
torchmetrics==0.11.4
torchserve==0.8.1
torchvision==0.15.2+cu118
transformers==4.30.2
wheel==0.40.0
torch==2.0.1+cu118
**Warning: torchtext not present ..
torchvision==0.15.2+cu118
torchaudio==2.0.2+cu118

Java Version:

OS: Ubuntu 20.04.6 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: version 3.26.4

Is CUDA available: Yes
CUDA runtime version: N/A
GPU models and configuration:
GPU 0: Tesla T4
Nvidia driver version: 470.182.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5

Repro instructions

Any example on k8s should have this most likely

Possible Solution

No response

arnavmehta7 commented 12 months ago

Update: Seems that there is a small bump over days, can it be due to accumulation of logs?

lxning commented 11 months ago

@arnavmehta7 Here is a report about the jvm memory usage in TS. The following picture is the memory usage of vgg16 scripted model soak test (long run job), where jvm is set as

vmargs=-Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError

arnavmehta7 commented 11 months ago

Okay @lxning so you think that the leak cannot be due to JVM? Seems like I will have to deploy a hello world thing on k8s to check

arnavmehta7 commented 11 months ago

Am I right to conclude that it reclaims memory after all requests are done?

lxning commented 11 months ago

@arnavmehta7 here is the code link about removing a job from jobQ once a worker is ready to process it.

arnavmehta7 commented 11 months ago

Thankyou @lxning I am doing more testing and will post soon. My fault most prob.

Btw do you know if there's anyway to restart he pod on k8s itself than to let torchserve stuck in a restarting-dying loop?

pytorch / serve