Closed mgoin closed 9 months ago
Looks good to me. I think it would be nice to clarify for the user what the valid values of num-kv-cache-tokens
are. It's the number of previous tokens in cache, so must be between 0 and context_length - prompt_processing_length
It will default to 1 or NM_BENCHMARK_KV_TOKENS, if it is set