I have a question. Does thhe codes in L172-L173(model.py) doing Position Embedding? Like the red box in the pic below.
If it is the pos emb, it seems apply in side the Attiontion class, which means every TransformerBlock will do the pos emb. Is it right? I learned from the paper , it seams Transformer only do pos-emb once before send Q/K/V matirx to attention. Is it designed by you for some purpose like this? Or am I wrong understanding about the code in model.py?
Hi, Thanks for the great work!
I have a question. Does thhe codes in L172-L173(model.py) doing Position Embedding? Like the red box in the pic below.
If it is the pos emb, it seems apply in side the Attiontion class, which means every TransformerBlock will do the pos emb. Is it right? I learned from the paper, it seams Transformer only do pos-emb once before send Q/K/V matirx to attention. Is it designed by you for some purpose like this? Or am I wrong understanding about the code in model.py?
Thanks.