twistedcubic / attention-rank-collapse

[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
Apache License 2.0
153 stars 10 forks source link