triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.07k stars 1.45k forks source link

I got the same problem on 21.11 and 21.12, it works with the single model or a couple of models, but triton never releases them. #6995

Open wangzz313 opened 6 months ago

wangzz313 commented 6 months ago
          I got the same problem on 21.11 and 21.12, it works with the single model or a couple of models, but triton never releases them. 

Ensemble model: Python backend(cpu) + onnx model(GPU)

python model: instance_group [ { kind: KIND_CPU } ]

model_warmup [{}]

response_cache { enable: True }

onnx model: instance_group [ { kind: KIND_GPU } ]

model_warmup [{}]

response_cache { enable: True

Originally posted by @alicimertcan in https://github.com/triton-inference-server/server/issues/3761#issuecomment-1018443038

lkomali commented 6 months ago

cc @GuanLuo @rmccorm4 @jbkyang-nvi

indrajit96 commented 6 months ago

Can you provide more information. Is this latest version of triton? If not can you try with the latest version 24.02

oandreeva-nv commented 6 months ago

Hi @wangzz313, as @indrajit96 suggested, have you tried the newer version of triton? 21.11 and 21.12 are quite old. Unfortunately, 24.02 version does not come with onnxruntime backend, so please try 24.01

rishabhmehrotra commented 1 week ago

@oandreeva-nv we're facing the same issue with 24.01 and 24.08 (cc: @susnato)