Open haiminh2001 opened 11 months ago
@rmccorm4 do you know if this is expected? Ensemble models do not support caching.
@kthui is correct. Top-level requests to ensembles do not currently support caching at this time, but the composing models within the ensemble may be cached individually if supported by that model. Added a note to the docs to clarify this: https://github.com/triton-inference-server/server/pull/6648.
We do have an open feature request to add caching support to ensembles, it just hasn't been prioritized yet.
ref: DLIS-4626
Description Caching is not woring with ensemble models. Triton Information 23.07
Are you using the Triton container or did you build it yourself? Triton container
To Reproduce Steps to reproduce the behavior.
I enabled response cache for an ensemble model and tried to repeat one request multiple times but no cache lookup were done.
I expect that the ensemble should support caching because if an ensemble has, say 10 models, each model has its own cache, then each request will look for the caches 10 times, but if the ensemble looks for the caches, the procedure will only be done once.