pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.58k stars 509 forks source link

question about position embedding #34

Open jchuai opened 10 months ago

jchuai commented 10 months ago

Hi, Thanks for the great work!

I have a question. Does thhe codes in L172-L173(model.py) doing Position Embedding? Like the red box in the pic below. image

If it is the pos emb, it seems apply in side the Attiontion class, which means every TransformerBlock will do the pos emb. Is it right? I learned from the paper , it seams Transformer only do pos-emb once before send Q/K/V matirx to attention. Is it designed by you for some purpose like this? Or am I wrong understanding about the code in model.py? image

Thanks.

briandw commented 9 months ago

Have a look at the Rotary Embeddings paper for the details: https://arxiv.org/pdf/2104.09864v5.pdf