Open WoosukKwon opened 9 months ago
another way to save memory is to use LRUcache for this map, and capture it on demand.
_Originally posted by @scv119 in https://github.com/vllm-project/vllm/pull/1926#discussion_r1427594126_
@WoosukKwon has this work been done?
Any update on caching the CUDA graphs?
another way to save memory is to use LRUcache for this map, and capture it on demand.
_Originally posted by @scv119 in https://github.com/vllm-project/vllm/pull/1926#discussion_r1427594126_