Open wangzz313 opened 6 months ago
cc @GuanLuo @rmccorm4 @jbkyang-nvi
Can you provide more information. Is this latest version of triton? If not can you try with the latest version 24.02
Hi @wangzz313, as @indrajit96 suggested, have you tried the newer version of triton? 21.11 and 21.12 are quite old. Unfortunately, 24.02 version does not come with onnxruntime backend, so please try 24.01
@oandreeva-nv we're facing the same issue with 24.01 and 24.08 (cc: @susnato)
Ensemble model: Python backend(cpu) + onnx model(GPU)
python model: instance_group [ { kind: KIND_CPU } ]
model_warmup [{}]
response_cache { enable: True }
onnx model: instance_group [ { kind: KIND_GPU } ]
model_warmup [{}]
response_cache { enable: True
Originally posted by @alicimertcan in https://github.com/triton-inference-server/server/issues/3761#issuecomment-1018443038