locuslab / massive-activations

Code accompanying the paper "Massive Activations in Large Language Models"
https://arxiv.org/abs/2402.17762
MIT License
106 stars 8 forks source link
large-language-models

Massive Activations in Large Language Models

Official PyTorch implementation of our paper:

Massive Activations in Large Language Models
Mingjie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu
Carnegie Mellon University, Meta AI Research and Bosch Center for AI
Paper - Project page

Most of the experiments in this paper were done on one A6000 GPU.


This paper studies the existence of massive activations in Large Language Models (LLMs). These activations have significantly larger magnitudes than other activations while on the other hand are extremely few in quantity.

This repository

Setup

Installation instructions can be found in INSTALL.md.

Outline

The contents of this repository are as follows:

Large Language Models (LLMs)

For some LLMs, e.g., LLaMA2-7B, you need to set the argument --access-token in order to access the weights.

Vision Transformers (ViTs)

Results DINOv2-reg ViT-S ViT-B ViT-L ViT-G
Original 81.9 84.8 86.3 87.0
Fix-Reg-Mean 81.7 85.0 86.2 87.0

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Reference

@article{sun2024massive,
  title={Massive Activations in Large Language Models}, 
  author={Sun, Mingjie and Chen, Xinlei and Kolter, J. Zico and Liu, Zhuang},
  year={2024},
  journal={arXiv preprint arXiv:2402.17762}
}