thu-nics / MoA

The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
MIT License
49 stars 2 forks source link