triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.22k stars 1.47k forks source link

Encounter memory leak issue when using http /load api to load new version's model #4528

Closed AmazingSkyLine closed 2 years ago

AmazingSkyLine commented 2 years ago

Description There is a memory leak issue when I using http /load api to load new version's model.

And It's weird that when I using /unload api to unload model first, then using /load api to load new version, there is no memory leak.

When I only using /load api to let triton auto load new version and unload old version, the problem happened. As same as this, using POLL mode,and copy new version to model dir to trigger model load/unload also has memory leak issue.

using /unload + /load:

execute steps:

unload old version first then load new version

curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/unload
sleep 10
curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/load

first several times:

              total        used        free      shared  buff/cache   available
Mem:          32768         423       31689           0         654       32344
Swap:             0           0           0

              total        used        free      shared  buff/cache   available
Mem:          32768         190       31922           0         654       32577
Swap:             0           0           0

              total        used        free      shared  buff/cache   available
Mem:          32768         553       31560           0         654       32214
Swap:             0           0           0

after 1000+ times:

              total        used        free      shared  buff/cache   available
Mem:          32768         550       31560           0         656       32217
Swap:             0           0           0

              total        used        free      shared  buff/cache   available
Mem:          32768         551       31560           0         656       32216
Swap:             0           0           0

              total        used        free      shared  buff/cache   available
Mem:          32768         550       31560           0         656       32217
Swap:             0           0           0

only using /load

execute steps:

just using /load to auto load new version and unload old version

curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/load

first several times load:

              total        used        free      shared  buff/cache   available
Mem:          32768         422       31620           0         725       32345
Swap:             0           0           0

              total        used        free      shared  buff/cache   available
Mem:          32768         493       31312           0         962       32274

after load 500+ times

              total        used        free      shared  buff/cache   available
Mem:          32768        1451       30116           0        1199       31316
Swap:             0           0           0

              total        used        free      shared  buff/cache   available
Mem:          32768        1523       30044           0        1199       31244
Swap:             0           0           0

              total        used        free      shared  buff/cache   available
Mem:          32768        1451       30116           0        1199       31316
Swap:             0           0           0

Triton Information triton version: 22.05 using official container provided on ngc: tritonserver:22.05-py5

To Reproduce

model: yolo-cppe5

You can download model from this link: https://storage.googleapis.com/tfhub-modules/rishit-dagli/yolo-cppe5/1.tar.gz

There is the config file:

name: "yolo-cppe5"
platform: "tensorflow_savedmodel"
input {
  name: "input"
  data_type: TYPE_FP32
  dims: 1
  dims: 3
  dims: 800
  dims: 1216
}
output {
  name: "labels"
  data_type: TYPE_INT64
  dims: 1
  dims: 100
}
output {
  name: "dets"
  data_type: TYPE_FP32
  dims: 1
  dims: 100
  dims: 5
}
instance_group {
  count: 1
  kind: KIND_CPU
}

triton startup args:

tritonserver --model-repository=/models --model-control-mode=explicit --strict-model-config=true --log-verbose=0 --grpc-port=8005 --metrics-port=6000 --backend-config=tensorflow,version=2

The test script:

# execute 9999 times
for i in $(seq 2 10000)
do
  # first copy new version from version 1
  echo 'copy model verion' $i
  mkdir ${model_path}/${i} && cp -r ${model_path}/1/* ${model_path}/${i}/
  if [ $i -gt 5 ]
  then
    # delete too old versions
    rm -rf ${model_path}/$((i-4))
  fi
  sleep 1
  # show memory
  free -m
  # invoke /load to load new version
  curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/load
  sleep 20
done

echo 'ok'

Expected behavior I expected using /load to load new version and unload old version automatically and there is no memery leak.

Tabrizian commented 2 years ago

Hi @AmazingSkyLine, this doesn't look like a memory leak. The first few inferences can lead to potential increase in the memory usage because the internal memory pool of the frameworks doesn't get released when the model is unloaded. You can refer to the document below for more info:

https://github.com/triton-inference-server/tensorflow_backend#how-does-the-tensorflow-backend-manage-gpu-memory

It is a memory leak if there is an unbounded memory growth when you load/unload the model many times.

AmazingSkyLine commented 2 years ago

@Tabrizian For this model, I don't do inference actually, I just using /load to load new model version repeatedly, but the memory has increased. For our another model, it's more obvious(but that model has warm-up and we can't to share that model). image(with 32G memory limit) steps:

  curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/load
  sleep 45

that model has initial memory usage for ~4.5G, with a 16G memory limit, this lead to OOM finally.

As a compare,if just call /load to load model totally first, then call /unload to unload model,serially load & unload model, the memory usage is much fewer(~16% -> 5G+) image(with 32G memory limit) steps:

  curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/load
  sleep 40
  curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/unload
  sleep 15
Tabrizian commented 2 years ago

I see. I'll be re-opening this ticket for further investigation.

cnegron-nv commented 2 years ago

@AmazingSkyLine TensorFlow doesn't release all memory when models are unloaded. This is because TensorFlow framework uses a high-watermark caching memory allocator and doesn't release freed memory to the system. This issue is outside the Triton Server and an issue in the upstream TensorFlow. There's an existing issue filed against TF (https://github.com/tensorflow/tensorflow/issues/36465) perhaps you can add to the urgency of this issue.

krishung5 commented 2 years ago

Hi @AmazingSkyLine, Thanks for raising this issue. I did some experiments regarding this. Below is the result.

  1. Serving image models with different backends Yolo only using /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793        8611       19121         267        4061       22456
    Swap:          2047        2047           0      
              total        used        free      shared  buff/cache   available
    Mem:          31793        8917       18571         267        4304       22150
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        9275       17970         267        4547       21792
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10267       16950         267        4576       20800
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10265       16952         267        4575       20802
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10268       16948         267        4576       20799
    Swap:          2047        2047           0

    using /unload + /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793        8663       18798         267        4331       22403
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8657       18561         267        4574       22410
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8664       18310         267        4818       22402
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793        8683       17211         267        5898       22384
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8680       17224         267        5888       22386
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8683       17221         267        5888       22383
    Swap:          2047        2047           0

    resnet50_savedmodel only using /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793        9827       13542         267        8423       21239
    Swap:          2047        2047           0
              total        used        free      shared  buff/cache   available
    Mem:          31793        9934       13334         267        8524       21132
    Swap:          2047        2047           0
              total        used        free      shared  buff/cache   available
    Mem:          31793       10037       13130         267        8624       21029
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10421       12717         267        8654       20645
    Swap:          2047        2047           0
              total        used        free      shared  buff/cache   available
    Mem:          31793       10405       12735         267        8652       20661
    Swap:          2047        2047           0
              total        used        free      shared  buff/cache   available
    Mem:          31793       10405       12735         267        8652       20661
    Swap:          2047        2047           0

    using /unload + /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793        9867       13451         267        8475       21200
    Swap:          2047        2047           0
              total        used        free      shared  buff/cache   available
    Mem:          31793        9871       13345         267        8575       21195
    Swap:          2047        2047           0
              total        used        free      shared  buff/cache   available
    Mem:          31793        9867       13249         267        8676       21199
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793        9883       13262         267        8663       21199
    Swap:          2047        2047           0
              total        used        free      shared  buff/cache   available
    Mem:          31793        9885       13250         267        8665       21190
    Swap:          2047        2047           0
              total        used        free      shared  buff/cache   available
    Mem:          31793        9882       13258         267        8663       21195
    Swap:          2047        2047           0

    resnet50_libtorch only using /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10395       12296         269        9101       20669
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10409       12180         269        9202       20654
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10433       12057         269        9302       20631
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10938       11301         270        9553       20130
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10942       11295         270        9555       20126
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10920       11321         270        9550       20148
    Swap:          2047        2047           0

    using /unload + /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10667       11774         270        9351       20401
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10644       11713         270        9435       20425
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10642       11614         270        9536       20427
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10702       11515         270        9575       20366
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10699       11517         270        9576       20369
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10734       11480         270        9578       20334
    Swap:          2047        2047           0

    resnet50_onnx only using /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10173       12519         269        9100       20892
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10294       12313         269        9184       20770
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10303       12204         269        9285       20760
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10777       11715         269        9301       20287
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10821       11671         269        9300       20243
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10842       11650         269        9300       20222
    Swap:          2047        2047           0

    using /unload + /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10114       12604         269        9075       20950
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10161       12471         269        9160       20903
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10169       12364         269        9260       20896
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793       10281       12208         269        9303       20783
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10280       12211         269        9302       20785
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793       10281       12208         269        9303       20783
    Swap:          2047        2047           0
  2. Serving simple/add_sub models with different backends savedmodel only using /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793        8322       17577         267        5893       22745
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8326       17573         267        5892       22740
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8323       17576         267        5892       22743
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793        8336       17542         267        5914       22731
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8339       17539         267        5914       22727
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8338       17540         267        5914       22729
    Swap:          2047        2047           0

    using /unload + /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793        8347       17517         267        5927       22719
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8349       17524         267        5919       22718
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8350       17523         267        5919       22717
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793        8345       17201         267        6246       22722
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8349       17197         267        6246       22718
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        8351       17195         267        6246       22716
    Swap:          2047        2047           0

    libtorch only using /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          64398        5028       27760         269       31609       58475
    Swap:          2047          14        2033              
              total        used        free      shared  buff/cache   available
    Mem:          64398        5031       27757         269       31609       58472
    Swap:          2047          14        2033              
              total        used        free      shared  buff/cache   available
    Mem:          64398        5031       27758         269       31609       58473
    Swap:          2047          14        2033

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          64398        5068       27714         269       31615       58435
    Swap:          2047          14        2033              
              total        used        free      shared  buff/cache   available
    Mem:          64398        5069       27714         269       31615       58435
    Swap:          2047          14        2033              
              total        used        free      shared  buff/cache   available
    Mem:          64398        5070       27713         269       31615       58434
    Swap:          2047          14        2033

    using /unload + /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          31793        9419       18805         267        3568       21647
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        9431       18807         267        3555       21636
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        9417       18822         267        3553       21649
    Swap:          2047        2047           0

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          31793        9390       18379         267        4023       21677
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        9392       18377         267        4023       21674
    Swap:          2047        2047           0              
              total        used        free      shared  buff/cache   available
    Mem:          31793        9391       18378         267        4023       21676
    Swap:          2047        2047           0

    onnx only using /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          64220        4116       32232         268       27871       59128
    Swap:          2047         144        1903              
              total        used        free      shared  buff/cache   available
    Mem:          64220        4121       32227         268       27871       59122
    Swap:          2047         144        1903              
              total        used        free      shared  buff/cache   available
    Mem:          64220        4120       32228         268       27871       59124
    Swap:          2047         144        1903

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          64220        4138       32308         268       27773       59105
    Swap:          2047         144        1903              
              total        used        free      shared  buff/cache   available
    Mem:          64220        4141       32305         268       27773       59103
    Swap:          2047         144        1903              
              total        used        free      shared  buff/cache   available
    Mem:          64220        4140       32306         268       27773       59103
    Swap:          2047         144        1903

    using /unload + /load: first several times

              total        used        free      shared  buff/cache   available
    Mem:          64220        4134       32313         268       27772       59109
    Swap:          2047         144        1903              
              total        used        free      shared  buff/cache   available
    Mem:          64220        4138       32309         268       27772       59106
    Swap:          2047         144        1903              
              total        used        free      shared  buff/cache   available
    Mem:          64220        4140       32307         268       27772       59104
    Swap:          2047         144        1903

    after 500+ times

              total        used        free      shared  buff/cache   available
    Mem:          64220        4152       32273         268       27794       59092
    Swap:          2047         144        1903              
              total        used        free      shared  buff/cache   available
    Mem:          64220        4152       32272         268       27794       59091
    Swap:          2047         144        1903              
              total        used        free      shared  buff/cache   available
    Mem:          64220        4152       32273         268       27794       59092
    Swap:          2047         144        1903

I ran with valgrind to check if there is memory leak. It came out clean, so I think this is not strictly a "leak" issue. We do expect some memory growth when loading more and more versions of model as Triton will keep track of all the information of every version of model, even after that version is unloaded. From the result above, we can see that the difference of memory usage between using only load and using load and unload is only observed when serving more complicated models. We don't see this behavior when serving simple models. This should have something to do with that complicated models allocate more memory than simple models do, otherwise we should be able to see the same behavior for simple models. Besides, we are only able to see that much of a difference on Tensorflow models. This could be related to @cnegron-nv's comment above about how Tensorflow framework allocates memory differently.

krishung5 commented 2 years ago

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.