Journal/Conference: ACL 2020
Title: That is a Known Lie: Detecting Previously Fact-Checked Claims
Authors: Shaden Shaar, Nikolay Babulkov, Giovanni Da San Martino, Preslav Nakov
URL: https://www.aclweb.org/anthology/2020.acl-main.332/
1 Introduction
237のactiveなfact-check団体と92の非activeな団体
これらのfact-checkサイトは学習用に用いられる
フェイクニュースのデータセット例:Isabelle Augenstein, Christina Lioma, DongshengWang, Lucas Chaves Lima, Casper Hansen, Chris-tian Hansen, and Jakob Grue Simonsen. 2019.MultiFC: A real-world multi-domain dataset forevidence-based fact checking of claims. InProceed-ings of the 2019 Conference on Empirical Methodsin Natural Language Processing and the 9th Interna-tional Joint Conference on Natural Language Pro-cessing, EMNLP-IJCNLP ’19, pages 4684–4696,Hong Kong, China
ファクトチェックのパイプライン:
(i) assessthe check-worthiness of the claim (which couldcome from social media, from a political debate,etc.)
本研究が行うこと:(ii) check whether a similar claim has beenpreviously fact-checked (the task we focus on here)
(iii) retrieve evidence (from the Web, from socialmedia, from Wikipedia, from a knowledge base,etc.)
(iv) assess the factuality of the claim
拡散の50%以上が最初の10分で起こる:Tauhid Zaman, Emily B. Fox, and Eric T. Bradlow.2014. A Bayesian approach for predicting the popu-larity of tweets.Ann. Appl. Stat., 8(3):1583–1611.
2 Related work
これまですでに、検証されたかどうかについて取り上げられていない
GoogleのFack Check Explorer
Knowledge Graphを用いた検証:ndon Tchechmedjiev, Pavlos Fafalios, KatarinaBoland, Malo Gasquet, Matth ̈aus Zloch, BenjaminZapilko, Stefan Dietze, and Konstantin Todorov. 2019.ClaimsKG: A knowledge graph of fact-checked claims. InProceedings of the 18th Interna-tional Semantic Web Conference, ISWC ’19, pages309–324, Auckland, New Zealand
Viet-Phi Huynh and Paolo Papotti. 2019. A bench-mark for fact checking algorithms built on knowl-edge bases. InProceedings of the 28th ACM Inter-national Conference on Information and KnowledgeManagement, CIKM ’19, pages 689–698, Beijing,China.
Mohamed H. Gad-Elrab, Daria Stepanova, JacopoUrbani, and Gerhard Weikum. 2019a.ExFaKT:A framework for explaining facts over knowledgegraphs and text. InProceedings of the Twelfth ACMInternational Conference on Web Search and DataMining, WSDM ’19, pages 87–95, Melbourne, Aus-tralia
本研究では,SNSでのclaimを用いて研究
・claimを反証・証明のためにweb miningが行われる研究
Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami,and James Glass. 2019. FAKTA: An automatic end-to-end fact checking system.InProceedings ofthe 2019 Conference of the North American Chap-ter of the Association for Computational Linguistics,NAACL-HLT ’19, pages 78–83, Minneapolis, MN,USA
Sebasti ̃ao Miranda, David Nogueira, Afonso Mendes,Andreas Vlachos, Andrew Secker, Rebecca Garrett,Jeff Mitchel, and Zita Marinho. 2019. Automatedfact checking in the news room. InProceedings ofthe World Wide Web Conference, WWW ’19, pages3579–3583, San Francisco, CA, USA
Ramy Baly, Mitra Mohtarami, James Glass, Llu ́ısM`arquez, Alessandro Moschitti, and Preslav Nakov.2018b. Integrating stance detection and fact check-ing in a unified corpus. InProceedings of the 2018Conference of the North American Chapter of theAssociation for Computational Linguistics: HumanLanguage Technologies, NAACL-HLT ’18, pages21–27, New Orleans, LA, USA
・Wikipediaや一般的な文書に対するfact-check
Yixin Nie, Haonan Chen, and Mohit Bansal. 2019.Combining fact extraction and verification with neu-ral semantic matching networks. InProceedings of 3618the 33rd AAAI Conference on Artificial Intelligence,AAAI ’19, pages 6859–6866, Honolulu, HI, USA
・表に基づいた事実確認:Wenhu Chen, Hongmin Wang, Jianshu Chen, YunkaiZhang, Hong Wang, Shiyang Li, Xiyou Zhou, andWilliam Yang Wang. 2019. TabFact: A large-scaledataset for table-based fact verification
・言語モデルを知識ーベースとして用いる:Fabio Petroni, Tim Rockt ̈aschel, Sebastian Riedel,Patrick Lewis, Anton Bakhtin, Yuxiang Wu, andAlexander Miller. 2019. Language models as knowl-edge bases?InProceedings of the 2019 Confer-ence on Empirical Methods in Natural LanguageProcessing and the 9th International Joint Confer-ence on Natural Language Processing, EMNLP-IJCNLP ’19, pages 2463–2473, Hong Kong, China
本実験:検索のためにBM25とBert-Based similarityを用いたRe-rankingを行っている
reciprocal rankを用いてランク付け + sentence-BERTやUniversal Sentence Encoderなどの利用を検証
近い研究:Zeynep Akkalyoncu Yilmaz, Shengjin Wang, WeiYang, Haotian Zhang, and Jimmy Lin. 2019a. Ap-plying BERT to document retrieval with Birch. InProceedings of the 2019 Conference on EmpiricalMethods in Natural Language Processing and the9th International Joint Conference on Natural Lan-guage Processing, EMNLP-IJCNLP ’19, pages 19–24, Hong Kong, China
本研究は:semantic relatedness taskやnatural language inference (NLIタスク),recognizing textual entailment (RTE)などとも近い
0. 論文
Journal/Conference: ACL 2020 Title: That is a Known Lie: Detecting Previously Fact-Checked Claims Authors: Shaden Shaar, Nikolay Babulkov, Giovanni Da San Martino, Preslav Nakov URL: https://www.aclweb.org/anthology/2020.acl-main.332/
1. どんなもの?
既存の研究では,SNSの投稿がフェイクかどうかを判定するタスクは多く取り組んできているが,投稿された内容がすでにフェイクニュースサイトで検証されているかかの検索タスクは取り組まれていない. 本研究では,投稿内容がフェイクニュースサイトでのニュースで既に述べられているかについて検索を行う新たなタスクを提案し,そのためのデータセットを公開.
2. 先行研究と比べてどこがすごい?
これまで行われていなかったタスク + データセットを公開した点.
3. 技術や手法のキモはどこ?
4. どうやって有効だと検証した?
検索タスクをBERT-based similarityやBM25といった手法で行い,SnopesやPolitifactの情報から構築した新たなデータセットでどの程度の検索精度なのかをMAP@kなどで検証した.
5. 議論はある?
6.次に読むべき論文は?
メモ
Abst ファクトチェックサイトでフェイクニュースがすでに検証されているかどうかを知りたい この問題をはじめて定式化し,特殊なデータセットを作成 検索やテキストの類似性アプローチを改善して,学習からランク付け
1 Introduction 237のactiveなfact-check団体と92の非activeな団体 これらのfact-checkサイトは学習用に用いられる フェイクニュースのデータセット例:Isabelle Augenstein, Christina Lioma, DongshengWang, Lucas Chaves Lima, Casper Hansen, Chris-tian Hansen, and Jakob Grue Simonsen. 2019.MultiFC: A real-world multi-domain dataset forevidence-based fact checking of claims. InProceed-ings of the 2019 Conference on Empirical Methodsin Natural Language Processing and the 9th Interna-tional Joint Conference on Natural Language Pro-cessing, EMNLP-IJCNLP ’19, pages 4684–4696,Hong Kong, China
ファクトチェックのパイプライン: (i) assessthe check-worthiness of the claim (which couldcome from social media, from a political debate,etc.) 本研究が行うこと:(ii) check whether a similar claim has beenpreviously fact-checked (the task we focus on here) (iii) retrieve evidence (from the Web, from socialmedia, from Wikipedia, from a knowledge base,etc.) (iv) assess the factuality of the claim
拡散の50%以上が最初の10分で起こる:Tauhid Zaman, Emily B. Fox, and Eric T. Bradlow.2014. A Bayesian approach for predicting the popu-larity of tweets.Ann. Appl. Stat., 8(3):1583–1611.
2 Related work これまですでに、検証されたかどうかについて取り上げられていない GoogleのFack Check Explorer
Knowledge Graphを用いた検証:ndon Tchechmedjiev, Pavlos Fafalios, KatarinaBoland, Malo Gasquet, Matth ̈aus Zloch, BenjaminZapilko, Stefan Dietze, and Konstantin Todorov. 2019.ClaimsKG: A knowledge graph of fact-checked claims. InProceedings of the 18th Interna-tional Semantic Web Conference, ISWC ’19, pages309–324, Auckland, New Zealand Viet-Phi Huynh and Paolo Papotti. 2019. A bench-mark for fact checking algorithms built on knowl-edge bases. InProceedings of the 28th ACM Inter-national Conference on Information and KnowledgeManagement, CIKM ’19, pages 689–698, Beijing,China. Mohamed H. Gad-Elrab, Daria Stepanova, JacopoUrbani, and Gerhard Weikum. 2019a.ExFaKT:A framework for explaining facts over knowledgegraphs and text. InProceedings of the Twelfth ACMInternational Conference on Web Search and DataMining, WSDM ’19, pages 87–95, Melbourne, Aus-tralia 本研究では,SNSでのclaimを用いて研究
・claimを反証・証明のためにweb miningが行われる研究 Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami,and James Glass. 2019. FAKTA: An automatic end-to-end fact checking system.InProceedings ofthe 2019 Conference of the North American Chap-ter of the Association for Computational Linguistics,NAACL-HLT ’19, pages 78–83, Minneapolis, MN,USA Sebasti ̃ao Miranda, David Nogueira, Afonso Mendes,Andreas Vlachos, Andrew Secker, Rebecca Garrett,Jeff Mitchel, and Zita Marinho. 2019. Automatedfact checking in the news room. InProceedings ofthe World Wide Web Conference, WWW ’19, pages3579–3583, San Francisco, CA, USA Ramy Baly, Mitra Mohtarami, James Glass, Llu ́ısM`arquez, Alessandro Moschitti, and Preslav Nakov.2018b. Integrating stance detection and fact check-ing in a unified corpus. InProceedings of the 2018Conference of the North American Chapter of theAssociation for Computational Linguistics: HumanLanguage Technologies, NAACL-HLT ’18, pages21–27, New Orleans, LA, USA
・Wikipediaや一般的な文書に対するfact-check Yixin Nie, Haonan Chen, and Mohit Bansal. 2019.Combining fact extraction and verification with neu-ral semantic matching networks. InProceedings of 3618the 33rd AAAI Conference on Artificial Intelligence,AAAI ’19, pages 6859–6866, Honolulu, HI, USA
・表に基づいた事実確認:Wenhu Chen, Hongmin Wang, Jianshu Chen, YunkaiZhang, Hong Wang, Shiyang Li, Xiyou Zhou, andWilliam Yang Wang. 2019. TabFact: A large-scaledataset for table-based fact verification ・言語モデルを知識ーベースとして用いる:Fabio Petroni, Tim Rockt ̈aschel, Sebastian Riedel,Patrick Lewis, Anton Bakhtin, Yuxiang Wu, andAlexander Miller. 2019. Language models as knowl-edge bases?InProceedings of the 2019 Confer-ence on Empirical Methods in Natural LanguageProcessing and the 9th International Joint Confer-ence on Natural Language Processing, EMNLP-IJCNLP ’19, pages 2463–2473, Hong Kong, China
本実験:検索のためにBM25とBert-Based similarityを用いたRe-rankingを行っている reciprocal rankを用いてランク付け + sentence-BERTやUniversal Sentence Encoderなどの利用を検証 近い研究:Zeynep Akkalyoncu Yilmaz, Shengjin Wang, WeiYang, Haotian Zhang, and Jimmy Lin. 2019a. Ap-plying BERT to document retrieval with Birch. InProceedings of the 2019 Conference on EmpiricalMethods in Natural Language Processing and the9th International Joint Conference on Natural Lan-guage Processing, EMNLP-IJCNLP ’19, pages 19–24, Hong Kong, China 本研究は:semantic relatedness taskやnatural language inference (NLIタスク),recognizing textual entailment (RTE)などとも近い
3 Task Definition 入力クレームと検証済みクレームのセットを入力として,検証済クレームをランク付け Table1:フェイクニュース検証ための入力と手動で付与されたアノテーション
4 Datasets PolitifactとSnopesのデータセット:Table3
4.1 Politifact dataset 16636の検証済みclaimを取得 768のInput-Verclaim pairsを取得
4.2 Snopes Dataset 1000 input / VerClaim pairs (10396)
4.3 Analysis ツイートとのマッチングが簡単 (Type-1)なものと難しいもの (Type-2)で分類 Table4:TF.IDF weighted コサイン類似度で分類
5 Evaluation Measures HasPositive@kやMAP@k
6 Models 6.1 BM25 6.2 BERT-based Models BERT / RoBert / sentence-BERT / BERT on full articles
6.3 Reranking rankSVMを用いての主張ペアのembeddingの類似スコアを用いてランク付け
7 Experiments BERT-based のsemantic similarityやランキングを比較
8 Conclusions and Future work 特殊なデータセットを作成し、私たちのコードとともに研究コミュニティに公開 学習 + ランク付けの実験を行い検索やテキスト類似性のモデルと精度を比較 テキストの主張を超えて、主張画像と主張動画のペアを入力としたい