karpathy / llama2.c

Inference Llama 2 in one file of pure C
MIT License
17.35k stars 2.06k forks source link

Question: Sliding window attention #424

Open stellanhaglund opened 1 year ago

stellanhaglund commented 1 year ago

Are there any plans of trying out sliding window attention like mistral on this repo, or is that more appropriate for a separate fork?

Also if anyone has tried anything with this I’m really interested in that.

artnoage commented 1 year ago

The new flash-attention has sliding window build in, however it doesnt stuck with compiling the model. So it is extremely easy to try it as it is but you will end up with slow training. there is an other repo called TinyLlama, where sliding window is an option, but my feeling as it is, is that is slower than this repo with the compile=True. It will be nice if they can implemented it though. I agree with you.

stellanhaglund commented 1 year ago

I'm not only interested in the performance side, I'm also interested in if there's any noticeable difference in the output with sliding window attention. It seems to benefit Mistral a lot.

VatsaDev commented 11 months ago

Mistral Is more data secret sauce than architecture change, it may only be slightly better