I am trying to see if we want to enable/disable cache that can be used of inference requests.For example one user asks a question on something how can that be used for figure similar kind of requests based on cache ..do I need to enable —prefix-caching flag.
second question where does vllm saves the cache ..is it in physical disk ?
Anything you want to discuss about vllm.
I am trying to see if we want to enable/disable cache that can be used of inference requests.For example one user asks a question on something how can that be used for figure similar kind of requests based on cache ..do I need to enable —prefix-caching flag.
second question where does vllm saves the cache ..is it in physical disk ?
Can someone please answer above questions