Open renxida opened 6 days ago
We currently have this in InferenceExecRequest:
The new methods would correspond to cache_page_indices
and free_cache_pages
.
The creation of a cache allocation should be used in lock_initial_cache_pages. lock_additional_cache_pages
should acquire another PageAllocation, then destroy the original one - - due to caching, the newly acquired pages would overlap maximally with the existing pages.
Implementing on #608
To manage the lifecycle of page allocations for an inference request, it may be important to use an interface to encapsulate: