Closed guschmue closed 2 months ago
continuation of the dialog are not really handled yet. In theory we can use the kv_cache to avoid processing the full prompt again but I ran into some issues with the model (at least I think the issue is with the model itself). I need to find some time to look into that.
are you referring to the Error: [WebGPU] Kernel "[Expand] /model/attn_mask_reformat/input_ids_subgraph/Expand" failed. Error: Expand requires shape to be broadcastable to input
with the shapes in the feed like
input_ids dims: [1, seq_length]
position_ids dims: [1, seq_length]
attention_mask dims: [1, seq_length + past_sequence_length]
past_key_values.i.key dims: [1, 32, past_sequence_length, 96]
past_key_values.i.value dims: [1, 32, past_sequence_length, 96]
mentioned here?
yes, that is the one
Error: previous buffer is not registered
The example chatbot can retain context from previous messages in the chat if the new message is sent with "Ctrl + Enter". To my understanding this way the LLM receives a bigger
input_ids
with tokens that represent previous messages as well as the new message. When I try doing "Ctrl+Enter" for the second message, after calculating and showing the first response token, I get theError: previous buffer is not registered
. Also, during the inference I noticed that the 3D graph in the Task manager/Permofmance/GPU starts showing a steep rise up to 100%, at which point the mentioned error is thrown. The dedicated GPU memory usage in the meantime is around 50-60%.I am guessing this is related to the gpu buffer management. Are there some tricks to make it more memory efficient?
What is peculiar is that it's not consistent with the size of the
input_ids
. In the above image the first bump is caused by an 500+ token input without continuation, and it ran alright. But the third bump is a 90 token input with continuation and it throws theError: previous buffer is not registered
.What is causing this? How can it be fixed?
OS: Windows 11 GPU: RTX 4060 8GB VRAM specs Browser: Chrome 126.0.6478.63 (Official Build) (64-bit)