triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.82k stars 1.42k forks source link

after calling unload_model capi, the memory is not completely released #7020

Open muyizi opened 4 months ago

muyizi commented 4 months ago

Description A clear and concise description of what the bug is. before calling unloadmodel,memory isbelow:

312354943-fc018784-536c-4320-b509-a38d481871ea

and after calling unloadmodel,memory isbelow:

312355002-f182110e-3cc1-4ac7-9b9d-d5cfd49e5819

Triton Information What version of Triton are you using? 2.40.0dev

Are you using the Triton container or did you build it yourself? build it by myself

To Reproduce Steps to reproduce the behavior.

image

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen. release memory completely

indrajit96 commented 4 months ago

CC @kthui

muyizi commented 2 months ago

@indrajit96 can you give me some suggestions?