Predicting the Topical Stance and Political Leaning of Media using Tweets

0. 論文

Journal/Conference: ACL 2020 Title: Predicting the Topical Stance and Political Leaning of Media using Tweets Authors: Peter Stefanov, Kareem Darwish, Atanas Atanasov, Preslav Nakov URL: https://www.aclweb.org/anthology/2020.acl-main.50/

1. どんなもの？

教師なし手法に基づいて分類されたTwitterユーザーから，そのトピック内で取り上げられているメディアの偏り度合いの推定に取り組んだ研究技術的な新しさよりも，メディアとpolarizationの関係に対するマイニングの側面が大きい今回の研究の概要図︰

2. 先行研究と比べてどこがすごい？

教師なし分類まの話 (Chapter 4.2まで)は，第2著者のICWSM2020 Unsupervised user stance detection on Twitter.と同じ内容新たに加えたのは，た点と，分類できたユーザからそこで拡散されているメディアの偏り度合い (polarization)を推定した点

3. 技術や手法のキモはどこ？

4. どうやって有効だと検証した？

以前の研究で取得した8つのトピックに関するTwitterの投稿を用いて，ユーザを教師なし + 教師ありで分類，そこで得られた特徴を用いて，拡散されているメディアのpolarizaitonを推定し，各特徴の有効度合いを検証

5. 議論はある？

discussionで述べられている点はData ming的な側面が強い (Polarizationの観点から観るメディアとユーザの関係)

6.次に読むべき論文は？

メモ

オンラインフォーラムの議論スレッドに対する教師なしのスタンス検出手法：Amine Trabelsi and Osmar R Za ̈ıane. 2018. Unsuper-vised model for topic viewpoint discovery in onlinedebates leveraging author interactions. InProceed-ings of the Twelfth International AAAI Conferenceon Web and Social Media, ICWSM ’18, pages 425–433, Stanford, CA, USA 教師なしスタンス検出と組み合わせられたリツイートに基づく教師ありスタンス検出手法：Kareem Darwish, Walid Magdy, Afshin Rahimi, Tim-othy Baldwin, and Norah Abokhodair. 2018. Pre-dicting online islamophobic behavior after #ParisAt-tacks.The Journal of Web Science, 4(3):34–52

Abst TwitterユーザのRT行動を用いてPolarization的なトピックに対するスタンスを把握するために，unsupervisedな手法を提案 + ユーザラベルに基づく教師付き学習を用いて、オンラインメディアやユーザの政治的傾向とスタンスを特徴づけるための手法を提案

1 Introduction Twitterの人気ユーザ：Influencers 周りのユーザの特性から政治的志向を判断：Filipe N Ribeiro, Lucas Henrique, Fabricio Ben-evenuto, Abhijnan Chakraborty, Juhi Kulshrestha,Mahmoudreza Babaei, and Krishna P Gummadi.2018. Media bias monitor: Quantifying biases ofsocial media news outlets at large-scale. 　In　Proceed-ings of the Twelfth International AAAI Conferenceon Web and Social Media, ICWSM ’18, pages 290–299, Stanford, CA, USA → 本論文では教師なしモデルを用いて大量のユーザの特定のトピックに対する自動タグ付けを提案 Kareem Darwish, Michael Aupetit, Peter Stefanov, andPreslav Nakov. 2020. Unsupervised user stance de-tection on Twitter. InProceedings of the Interna-tional AAAI Conference on Web and Social Media,ICWSM ’20, Atlanta, GA, USA (あの論文ICWSMに通ってたのか)

教師なしの手法でユーザを分類 8つのトピックに対するユーザのスタンスを考慮してsource に対する信頼性を判定するGraphやcontext，textを考慮したembedding モデルを学習貢献・教師なしスタンス検出を用いてユーザのスタンスを決定・発見されたユーザのスタンスを元に，distant supervisionを用いてユーザの政治的傾向を把握・メディアバイアス/ファクトチェックからラベルを取得したニュースに対して予測

2 Related Work selective exposureやcognitive disonanceなどが起因 Pablo Barber ́a and Gaurav Sood. 2015. Follow yourideology: Measuring media ideology on social net-works. InProceedings of the Annual Meeting ofthe European Political Science Association, Vienna, Austria.：メディアソースやTwitterでのパーソナリティやフォロワー関係に基づいて統計的モデルを提案ユーザの考えは時間の経過に対して変わらず：Walid Magdy, Kareem Darwish, Norah Abokhodair,Afshin Rahimi, and Timothy Baldwin. 2016a. #isi-sisnotislam or #deportallmuslims?: Predicting un-spoken views. InProceedings of the 8th ACM Con-ference on Web Science, WebSci ’16, pages 95–106,Hannover, German 確率的プログラミングシステムを使用してスタンスとdisagreementをモデル化：Dhanya Sridhar, James Foulds, Bert Huang, LiseGetoor, and Marilyn Walker. 2015. Joint models ofdisagreement and stance in online debate. InPro-ceedings of the 53rd Annual Meeting of the Associa-tion for Computational Linguistics and the 7th Inter-national Joint Conference on Natural Language Pro-cessing, AXLL-IJCNLP ’15, pages 116–125, Bei-jing, China オンラインフォーラムの議論スレッドの教師なしのスタンス検出手法：Amine Trabelsi and Osmar R Za ̈ıane. 2018. Unsuper-vised model for topic viewpoint discovery in onlinedebates leveraging author interactions. InProceed-ings of the Twelfth International AAAI Conferenceon Web and Social Media, ICWSM ’18, pages 425–433, Stanford, CA, USA 教師あり手法のスタンス検出と半教師あり手法のスタンス検出 Unsupervised user-stnace classification：二極化したトピックに対して、ほぼ完璧なクラスタリング精度で高い効果

リツイートに基づく教師付き分類と組み合わせ：Kareem Darwish, Walid Magdy, Afshin Rahimi, Tim-othy Baldwin, and Norah Abokhodair. 2018. Pre-dicting online islamophobic behavior after #ParisAt-tacks.The Journal of Web Science, 4(3):34–52

3 Data Collection Unsupervised user stance classificationの論文の人と同じ人が第2著者に入っているので、、、同じデータ持ってるのか (Fig.1)

4 User Stance Detection 大量のユーザに対してprojection and clusteringと教師付き分類の2段階approachを提案教師なしで一部のユーザを分類し，それで学習した教師あり学習器でユーザを分類 (FatText)

4.1 Projection and Clustering アクティブなユーザ1000人に対してユーザをUMAP上にprojection → Meanshiftを用いてクラスタリング

4.2 Supervised Classification FastTextで学習・そのほかのユーザを分類 Table2：各人数

4.3 Calculating Variance Scores 各インフルエンザーごとにvalence score ([-1, 1])を算出：どちらのグループのユーザと関連しているのか？を表したスコア (e.q.2) 自然対数を用いることで，引用した記事の数と総引用数のバランスをとる得られたスコアを5種類のグループに分割 (e.q.3)

5 Characterizing the influencers Table3：各トピックで上位に引用されたメディアのsourceのvalence categories + mediabiasfactcheck.comで各メディアの偏り度も算出 Table4：各トピック・各valence scoreで最も頻繁に囲繞されたメディアソースを示す Fig3：混合行列 (valenceと偏り度)

更に，上位200個のアカウントのvalence scoreを算出し，左右のラベル付今回の計算方法では negative valence score = right side (support for Trump, Republican，gun right，abortion 反対) positive valence score = left sideを示す (support for Democrati，liberal social position) (mediaBiasFactCheckがある種の偏りを持ってる可能性は？) → Table5に結果を示す・valenceが極端なユーザが多い

以上からわかること・valence scoreの偏りから，左よりのユーザは右よりのメディアをほんとど引用しない (右よりのユーザが左寄りのメディアを引用しているのは自分の意見の補強という形) ・右寄りのsourceの事実性が低いものを引用しがち (ワクチンの例) ・トピックにおいてsource の valence scoreはある程度安定 (どのトピックでも立場は一貫)

6 Predicting Media bias 8つのトピックに対するユーザのスタンスから，メディアの偏りを予測するための教師付き分類器を構築結果はTable 6 ・valence scores 2通りの方法で利用・ターゲットに対するaverage valence scoreから判断・valence scoreから政治的傾向を予測するためのLogistic Regressionを構築：入力を各トピックのvalence scoreを記載した8次元 → Table6の1 段目

・Graph Embeddings User-to-Hashtag (U2H) User-to-Mention (U2M) の2つのグラフを構築してnode2vecでembedding Table6 4段目・Tweets BERT-based Table6 3段目・Article titles and text Table6 2段目・system combination Table6 5段目

7 Conclusion and Future work ラベル付けの必要がないことが最大のadvantage valence scoreがメディアの偏りを示している偏りのあるトピックを自動的に識別したい：国・言語に限らず展開が可能？

hkefka385 / paper_reading