Closed AmazingSkyLine closed 2 years ago
Hi @AmazingSkyLine, this doesn't look like a memory leak. The first few inferences can lead to potential increase in the memory usage because the internal memory pool of the frameworks doesn't get released when the model is unloaded. You can refer to the document below for more info:
It is a memory leak if there is an unbounded memory growth when you load/unload the model many times.
@Tabrizian For this model, I don't do inference actually, I just using /load to load new model version repeatedly, but the memory has increased. For our another model, it's more obvious(but that model has warm-up and we can't to share that model). (with 32G memory limit) steps:
curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/load
sleep 45
that model has initial memory usage for ~4.5G, with a 16G memory limit, this lead to OOM finally.
As a compare,if just call /load to load model totally first, then call /unload to unload model,serially load & unload model, the memory usage is much fewer(~16% -> 5G+) (with 32G memory limit) steps:
curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/load
sleep 40
curl -s -X POST http://localhost:8000/v2/repository/models/${MODEL_NAME}/unload
sleep 15
I see. I'll be re-opening this ticket for further investigation.
@AmazingSkyLine TensorFlow doesn't release all memory when models are unloaded. This is because TensorFlow framework uses a high-watermark caching memory allocator and doesn't release freed memory to the system. This issue is outside the Triton Server and an issue in the upstream TensorFlow. There's an existing issue filed against TF (https://github.com/tensorflow/tensorflow/issues/36465) perhaps you can add to the urgency of this issue.
Hi @AmazingSkyLine, Thanks for raising this issue. I did some experiments regarding this. Below is the result.
Serving image models with different backends Yolo only using /load: first several times
total used free shared buff/cache available
Mem: 31793 8611 19121 267 4061 22456
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8917 18571 267 4304 22150
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9275 17970 267 4547 21792
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 10267 16950 267 4576 20800
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10265 16952 267 4575 20802
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10268 16948 267 4576 20799
Swap: 2047 2047 0
using /unload + /load: first several times
total used free shared buff/cache available
Mem: 31793 8663 18798 267 4331 22403
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8657 18561 267 4574 22410
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8664 18310 267 4818 22402
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 8683 17211 267 5898 22384
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8680 17224 267 5888 22386
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8683 17221 267 5888 22383
Swap: 2047 2047 0
resnet50_savedmodel only using /load: first several times
total used free shared buff/cache available
Mem: 31793 9827 13542 267 8423 21239
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9934 13334 267 8524 21132
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10037 13130 267 8624 21029
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 10421 12717 267 8654 20645
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10405 12735 267 8652 20661
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10405 12735 267 8652 20661
Swap: 2047 2047 0
using /unload + /load: first several times
total used free shared buff/cache available
Mem: 31793 9867 13451 267 8475 21200
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9871 13345 267 8575 21195
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9867 13249 267 8676 21199
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 9883 13262 267 8663 21199
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9885 13250 267 8665 21190
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9882 13258 267 8663 21195
Swap: 2047 2047 0
resnet50_libtorch only using /load: first several times
total used free shared buff/cache available
Mem: 31793 10395 12296 269 9101 20669
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10409 12180 269 9202 20654
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10433 12057 269 9302 20631
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 10938 11301 270 9553 20130
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10942 11295 270 9555 20126
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10920 11321 270 9550 20148
Swap: 2047 2047 0
using /unload + /load: first several times
total used free shared buff/cache available
Mem: 31793 10667 11774 270 9351 20401
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10644 11713 270 9435 20425
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10642 11614 270 9536 20427
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 10702 11515 270 9575 20366
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10699 11517 270 9576 20369
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10734 11480 270 9578 20334
Swap: 2047 2047 0
resnet50_onnx only using /load: first several times
total used free shared buff/cache available
Mem: 31793 10173 12519 269 9100 20892
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10294 12313 269 9184 20770
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10303 12204 269 9285 20760
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 10777 11715 269 9301 20287
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10821 11671 269 9300 20243
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10842 11650 269 9300 20222
Swap: 2047 2047 0
using /unload + /load: first several times
total used free shared buff/cache available
Mem: 31793 10114 12604 269 9075 20950
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10161 12471 269 9160 20903
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10169 12364 269 9260 20896
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 10281 12208 269 9303 20783
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10280 12211 269 9302 20785
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 10281 12208 269 9303 20783
Swap: 2047 2047 0
Serving simple/add_sub models with different backends savedmodel only using /load: first several times
total used free shared buff/cache available
Mem: 31793 8322 17577 267 5893 22745
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8326 17573 267 5892 22740
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8323 17576 267 5892 22743
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 8336 17542 267 5914 22731
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8339 17539 267 5914 22727
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8338 17540 267 5914 22729
Swap: 2047 2047 0
using /unload + /load: first several times
total used free shared buff/cache available
Mem: 31793 8347 17517 267 5927 22719
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8349 17524 267 5919 22718
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8350 17523 267 5919 22717
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 8345 17201 267 6246 22722
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8349 17197 267 6246 22718
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 8351 17195 267 6246 22716
Swap: 2047 2047 0
libtorch only using /load: first several times
total used free shared buff/cache available
Mem: 64398 5028 27760 269 31609 58475
Swap: 2047 14 2033
total used free shared buff/cache available
Mem: 64398 5031 27757 269 31609 58472
Swap: 2047 14 2033
total used free shared buff/cache available
Mem: 64398 5031 27758 269 31609 58473
Swap: 2047 14 2033
after 500+ times
total used free shared buff/cache available
Mem: 64398 5068 27714 269 31615 58435
Swap: 2047 14 2033
total used free shared buff/cache available
Mem: 64398 5069 27714 269 31615 58435
Swap: 2047 14 2033
total used free shared buff/cache available
Mem: 64398 5070 27713 269 31615 58434
Swap: 2047 14 2033
using /unload + /load: first several times
total used free shared buff/cache available
Mem: 31793 9419 18805 267 3568 21647
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9431 18807 267 3555 21636
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9417 18822 267 3553 21649
Swap: 2047 2047 0
after 500+ times
total used free shared buff/cache available
Mem: 31793 9390 18379 267 4023 21677
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9392 18377 267 4023 21674
Swap: 2047 2047 0
total used free shared buff/cache available
Mem: 31793 9391 18378 267 4023 21676
Swap: 2047 2047 0
onnx only using /load: first several times
total used free shared buff/cache available
Mem: 64220 4116 32232 268 27871 59128
Swap: 2047 144 1903
total used free shared buff/cache available
Mem: 64220 4121 32227 268 27871 59122
Swap: 2047 144 1903
total used free shared buff/cache available
Mem: 64220 4120 32228 268 27871 59124
Swap: 2047 144 1903
after 500+ times
total used free shared buff/cache available
Mem: 64220 4138 32308 268 27773 59105
Swap: 2047 144 1903
total used free shared buff/cache available
Mem: 64220 4141 32305 268 27773 59103
Swap: 2047 144 1903
total used free shared buff/cache available
Mem: 64220 4140 32306 268 27773 59103
Swap: 2047 144 1903
using /unload + /load: first several times
total used free shared buff/cache available
Mem: 64220 4134 32313 268 27772 59109
Swap: 2047 144 1903
total used free shared buff/cache available
Mem: 64220 4138 32309 268 27772 59106
Swap: 2047 144 1903
total used free shared buff/cache available
Mem: 64220 4140 32307 268 27772 59104
Swap: 2047 144 1903
after 500+ times
total used free shared buff/cache available
Mem: 64220 4152 32273 268 27794 59092
Swap: 2047 144 1903
total used free shared buff/cache available
Mem: 64220 4152 32272 268 27794 59091
Swap: 2047 144 1903
total used free shared buff/cache available
Mem: 64220 4152 32273 268 27794 59092
Swap: 2047 144 1903
I ran with valgrind to check if there is memory leak. It came out clean, so I think this is not strictly a "leak" issue. We do expect some memory growth when loading more and more versions of model as Triton will keep track of all the information of every version of model, even after that version is unloaded. From the result above, we can see that the difference of memory usage between using only load
and using load
and unload
is only observed when serving more complicated models. We don't see this behavior when serving simple models. This should have something to do with that complicated models allocate more memory than simple models do, otherwise we should be able to see the same behavior for simple models. Besides, we are only able to see that much of a difference on Tensorflow models. This could be related to @cnegron-nv's comment above about how Tensorflow framework allocates memory differently.
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.
Description There is a memory leak issue when I using http /load api to load new version's model.
And It's weird that when I using /unload api to unload model first, then using /load api to load new version, there is no memory leak.
When I only using /load api to let triton auto load new version and unload old version, the problem happened. As same as this, using POLL mode,and copy new version to model dir to trigger model load/unload also has memory leak issue.
using /unload + /load:
execute steps:
unload old version first then load new version
first several times:
after 1000+ times:
only using /load
execute steps:
just using /load to auto load new version and unload old version
first several times load:
after load 500+ times
Triton Information triton version: 22.05 using official container provided on ngc: tritonserver:22.05-py5
To Reproduce
model: yolo-cppe5
You can download model from this link: https://storage.googleapis.com/tfhub-modules/rishit-dagli/yolo-cppe5/1.tar.gz
There is the config file:
triton startup args:
The test script:
Expected behavior I expected using /load to load new version and unload old version automatically and there is no memery leak.