Open 1120475708 opened 11 hours ago
I get some errors when I write it like this
parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:0;gpu:1;gpu:2" } }
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0,1,2 ]
}
]
Internal desc = in ensemble 'similarity2_1', onnx runtime error 2: Did not find an arena based allocator registered for device-id combination in the memory arena shrink list: gpu:0
Internal desc = in ensemble 'similarity2_1', onnx runtime error 2: Did not find an arena based allocator registered for device-id combination in the memory arena shrink list: gpu:1
The question is how do you free memory
https://github.com/triton-inference-server/onnxruntime_backend/issues/103
When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like