Closed x54-729 closed 1 week ago
fixed
https://github.com/microsoft/unilm/blob/53ed1159c596f33af5b228f6041f6d7ffee963c0/YOCO/yoco/models/decoder/sliding_window_attention.py#L52-L53 https://github.com/microsoft/unilm/blob/53ed1159c596f33af5b228f6041f6d7ffee963c0/YOCO/yoco/models/decoder/sliding_window_attention.py#L58-L59
After fix, which one is correct here as parameter, self.window_size
or self.window_size - 1
?
Edit:: also for this if
statement:
https://github.com/microsoft/unilm/blob/53ed1159c596f33af5b228f6041f6d7ffee963c0/YOCO/yoco/models/decoder/sliding_window_attention.py#L57
The exact window size follows FlashAttention interface. The result is always correct when provided KV cache is not less than real window size. The code is just for convenience.
Thanks!
SlidingWindowAttention
,self.window_size
issliding_size - 1
when init: https://github.com/microsoft/unilm/blob/378d4280ebf68a2e10d74c9e8081823934b65249/YOCO/yoco/models/decoder/sliding_window_attention.py#L21 But in forward it is still minus 1 when calling flash_attn_func https://github.com/microsoft/unilm/blob/378d4280ebf68a2e10d74c9e8081823934b65249/YOCO/yoco/models/decoder/sliding_window_attention.py#L64SlidingWindowAttention
, the using of key value seems not right https://github.com/microsoft/unilm/blob/378d4280ebf68a2e10d74c9e8081823934b65249/YOCO/yoco/models/decoder/sliding_window_attention.py#L50-L64 When calling flash_attn_func, the k v is not concat