Open iyupan opened 8 months ago
This is in section 2.1 Which Layers?
In LLaMA2-7B, massive activations first appear in layer 2 and remain nearly constant values until layer 30. Intriguingly, for LLaMA2-7B and 13B, massive activations emerge very rapidly from one layer of computation, e.g., layer 2 and layer 4 respectively. This means that they do not emerge as a result of gradual accumulation through many layers, and are caused by a rather different mechanism.
Hello,
This is great work! And I wonder about the layer that the analyzed activations are from. The last layer?