Open yangzhj53 opened 9 months ago
I wander how the streaming-llm answers the questions in the middle of long input. Specifically, what is the entire decoding process? When it generates the first token, where does the tokens in the KV cache come from?
I wander how the streaming-llm answers the questions in the middle of long input. Specifically, what is the entire decoding process? When it generates the first token, where does the tokens in the KV cache come from?