Which layer's activation is used?

locuslab / massive-activations

Code accompanying the paper "Massive Activations in Large Language Models"

https://arxiv.org/abs/2402.17762

MIT License

121 stars 8 forks source link

Which layer's activation is used? #1

Open iyupan opened 8 months ago

iyupan commented 8 months ago

Hello,

This is great work! And I wonder about the layer that the analyzed activations are from. The last layer?

QiaoranC commented 8 months ago

This is in section 2.1 Which Layers?

In LLaMA2-7B, massive activations first appear in layer 2 and remain nearly constant values until layer 30. Intriguingly, for LLaMA2-7B and 13B, massive activations emerge very rapidly from one layer of computation, e.g., layer 2 and layer 4 respectively. This means that they do not emerge as a result of gradual accumulation through many layers, and are caused by a rather different mechanism.