naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time
MIT License
13.6k stars 1.09k forks source link

A minor problem regarding the skip-connection visualization #19

Open YunhaoZhang-Mars opened 4 months ago

YunhaoZhang-Mars commented 4 months ago

Thanks for the awesome repository! After going through it step-by-step, I have a better understanding of Llama3 techniques, such as rotary position embedding, grouped key and value, etc.

I found that there might be a minor mistake regarding the skip-connection visualization: the corresponding code is in the section "WE FINALLY HAVE NEW EDITED EMBEDDINGS FOR EACH TOKEN AFTER THE FIRST LAYER" :

layer_0_embedding = embedding_after_edit+output_after_feedforward
layer_0_embedding.shape

As embedding_after_edit instead of embedding_after_edit_normalized is used, the visualization should be afterattention-correct

wangdsh commented 4 months ago

I found the same problem.

naklecha commented 3 months ago

oh yes you are actually right!!!