Open wjj19950828 opened 8 months ago
I have run through the entire process of llama2 and want to stress test and see the benchmark indicators.
Regarding max_tokens_in_paged_kv_cache, I may not understand it well
max_tokens_in_paged_kv_cache
Is it similar to the max_num_batched_tokens parameter of vllm?
Thanks~
Hi, which GPU did you use?
A100
It means the maximun tokens we can save in our paged kv cache.
I have run through the entire process of llama2 and want to stress test and see the benchmark indicators.
Regarding
max_tokens_in_paged_kv_cache
, I may not understand it wellIs it similar to the max_num_batched_tokens parameter of vllm?
Thanks~