How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT′s Attention

https://twitter.com/arxiv_cscl/status/1324134478592311296

How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT$'$s Attention https://t.co/F6Kelbks9Z
— arXiv CS-CL (@arxiv_cscl) November 4, 2020

https://arxiv.org/abs/2011.00943 How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT′s Attention Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo Shanghai Jiao Tong University, Shanghai Qi Zhi Institute Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism. As most of the research focus on probing tasks or hidden states, previous works have found some primitive patterns of attention head behavior by heuristic analytical methods, but a more systematic analysis specific on the attention patterns still remains primitive. In this work, we clearly cluster the attention heatmaps into significantly different patterns through unsupervised clustering on top of a set of proposed features, which corroborates with previous observations. We further study their corresponding functions through analytical study. In addition, our proposed features can be used to explain and calibrate different attention heads in Transformer models. BERTの見方：距離ベースのクラスタリングとBERTの分析′■注意 Yue Guan、Jingwen Leng、Chao Li、Quan Chen、Minyi Guo マルチヘッド注意メカニズムに関する最近の研究、特にBERTなどの事前トレーニング済みモデルでの研究は、メカニズムのさまざまな側面を分析する際のヒューリスティックと手がかりを示しています。研究のほとんどがタスクまたは隠れた状態の調査に焦点を合わせているため、以前の研究では、ヒューリスティック分析手法によって注意の頭の動作のいくつかの原始的なパターンが見つかりましたが、注意パターンに固有のより体系的な分析は依然として原始的です。この作業では、提案された機能のセットの上に教師なしクラスタリングを行うことで、注意ヒートマップを大幅に異なるパターンに明確にクラスター化します。これは、以前の観察結果と一致します。さらに、分析研究を通じてそれらの対応する機能を研究します。さらに、提案された機能を使用して、Transformerモデルのさまざまなアテンションヘッドを説明および調整できます。

morioka / reading

How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT′s Attention #33