Thanks for the awesome repository! After going through it step-by-step, I have a better understanding of Llama3 techniques, such as rotary position embedding, grouped key and value, etc.
I found that there might be a minor mistake regarding the skip-connection visualization: the corresponding code is in the section "WE FINALLY HAVE NEW EDITED EMBEDDINGS FOR EACH TOKEN AFTER THE FIRST LAYER" :
Thanks for the awesome repository! After going through it step-by-step, I have a better understanding of Llama3 techniques, such as rotary position embedding, grouped key and value, etc.
I found that there might be a minor mistake regarding the skip-connection visualization: the corresponding code is in the section "WE FINALLY HAVE NEW EDITED EMBEDDINGS FOR EACH TOKEN AFTER THE FIRST LAYER" :
As
embedding_after_edit
instead ofembedding_after_edit_normalized
is used, the visualization should be