This PR add one feature: free engine resource (cache and other resource) after completing a request.
Advance kennel like PageAttention reserve page block for different tokens in insert and decode, all these reserve resource must be free after completing the decode of a request, the free page block can be reused for coming requests.
Once all engine implement this function, will force this function as abstractmethod.
This PR add one feature: free engine resource (cache and other resource) after completing a request.
Advance kennel like PageAttention reserve page block for different tokens in insert and decode, all these reserve resource must be free after completing the decode of a request, the free page block can be reused for coming requests.
Once all engine implement this function, will force this function as abstractmethod.