BERT-related Papers

This is a list of BERT-related papers. Any feedback is welcome.

(ChatGPT-related papers are listed at https://github.com/tomohideshibata/ChatGPT-related-papers.)

Survey paper
Downstream task
Generation
Quality evaluator
Modification (multi-task, masking strategy, etc.)
Sentence embedding
Transformer variants
Probe
Inside BERT
Multi-lingual
Other than English models
Domain specific
Multi-modal
Model compression
Large language model
Reinforcement learning from human feedback
Misc.

Survey paper

Evolution of transfer learning in natural language processing
Pre-trained Models for Natural Language Processing: A Survey
A Survey on Contextual Embeddings
A Survey on Transfer Learning in Natural Language Processing
Which *BERT? A Survey Organizing Contextualized Encoders (EMNLP2020)
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Pre-Trained Models: Past, Present and Future
A Survey of Transformers
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Paradigm Shift in Natural Language Processing
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
Formal Algorithms for Transformers
A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Downstream task

QA, MC, Dialogue

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets
A BERT Baseline for the Natural Questions
MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension (ACL2019)
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions (NAACL2019) [github]
Natural Perturbation for Robust Question Answering
Unsupervised Domain Adaptation on Reading Comprehension
BERTQA -- Attention on Steroids
Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering (ACL2020)
UnifiedQA: Crossing Format Boundaries With a Single QA System
How Can We Know When Language Models Know?
A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning (EMNLP2019)
A Simple and Effective Model for Answering Multi-span Questions [github]
Injecting Numerical Reasoning Skills into Language Models (ACL2020)
Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks
SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
Multi-hop Question Answering via Reasoning Chains
Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering (EMNLP2019 WS)
Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering (ACL2020)
HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
Unsupervised Multi-hop Question Answering by Question Generation (NAACL2021)
End-to-End Open-Domain Question Answering with BERTserini (NAALC2019)
Latent Retrieval for Weakly Supervised Open Domain Question Answering (ACL2019)
Dense Passage Retrieval for Open-Domain Question Answering (EMNLP2020)
Efficient Passage Retrieval with Hashing for Open-domain Question Answering (ACL2021)
End-to-End Training of Neural Retrievers for Open-Domain Question Answering
Domain-matched Pre-training Tasks for Dense Retrieval
Towards Robust Neural Retrieval Models with Synthetic Pre-Training
Simple Entity-Centric Questions Challenge Dense Retrievers (EMNLP2021) [github]
Phrase Retrieval Learns Passage Retrieval, Too (EMNLP2021) [github]
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering (EACL2021)
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval
Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval (NAACL2021) [github]
Retrieve, Read, Rerank, then Iterate: Answering Open-Domain Questions of Varying Reasoning Steps from Text
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
Pre-training Tasks for Embedding-based Large-scale Retrieval (ICLR2020)
Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering (EMNLP2019)
QED: A Framework and Dataset for Explanations in Question Answering [github]
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering (ICLR2020)
Relevance-guided Supervision for OpenQA with ColBERT
RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering
Joint Passage Ranking for Diverse Multi-Answer Retrieval
SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval
Don't Read Too Much into It: Adaptive Computation for Open-Domain Question Answering (EMNLP2020 WS)
Pruning the Index Contents for Memory Efficient Open-Domain QA [github]
Is Retriever Merely an Approximator of Reader?
Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation
RikiNet: Reading Wikipedia Pages for Natural Question Answering (ACL2020)
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA
DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding (SIGIR2020)
Learning to Ask Unanswerable Questions for Machine Reading Comprehension (ACL2019)
Unsupervised Question Answering by Cloze Translation (ACL2019)
Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation (ICLR2020)
A Recurrent BERT-based Model for Question Generation (EMNLP2019 WS)
Unsupervised Question Decomposition for Question Answering [github]
Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models
Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering (ACL2020)
What Are People Asking About COVID-19? A Question Classification Dataset
Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension (ACL2019)
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering (NAACL2021) [github] [blog]
Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning (CIKM2019)
SG-Net: Syntax-Guided Machine Reading Comprehension
MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning (EMNLP2019)
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (ICLR2020)
Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization
BAS: An Answer Selection Method Using BERT Language Model
Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection (AMMCS2019)
TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection (AAAI2020)
The Cascade Transformer: an Application for Efficient Answer Sentence Selection (ACL2020)
Support-BERT: Predicting Quality of Question-Answer Pairs in MSDN using Deep Bidirectional Transformer
Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
Benchmarking Robustness of Machine Reading Comprehension Models
Evaluating NLP Models via Contrast Sets
Undersensitivity in Neural Reading Comprehension
Developing a How-to Tip Machine Comprehension Dataset and its Evaluation in Machine Comprehension by BERT (ACL2020 WS)
A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension (ACL2019 WS)
FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension (ACL2019 WS)
BERT with History Answer Embedding for Conversational Question Answering (SIGIR2019)
GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension (ICML2019 WS)
TAPAS: Weakly Supervised Table Parsing via Pre-training (ACL2020)
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data (ACL2020)
Understanding tables with intermediate pre-training (EMNLP2020 Findings)
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing (ICLR2021)
Table Search Using a Deep Contextualized Language Model (SIGIR2020)
Open Domain Question Answering over Tables via Dense Retrieval (NAACL2021)
Capturing Row and Column Semantics in Transformer Based Question Answering over Tables (NAACL2021)
MATE: Multi-view Attention for Table Transformer Efficiency (EMNLP2021)
TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions (EMNLP2020)
Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian (RANLP2019)
XQA: A Cross-lingual Open-domain Question Answering Dataset (ACL2019)
XOR QA: Cross-lingual Open-Retrieval Question Answering (NAACL2021) [website]
Cross-Lingual Machine Reading Comprehension (EMNLP2019)
Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
Multilingual Question Answering from Formatted Text applied to Conversational Agents
BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels (EMNLP2019)
MLQA: Evaluating Cross-lingual Extractive Question Answering
Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation (COLING2020)
MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering [github]
Towards More Equitable Question Answering Systems: How Much More Data Do You Need? (ACL2021)
X-METRA-ADA: Cross-lingual Meta-Transfer Learning Adaptation to Natural Language Understanding and Question Answering (NAACL2021)
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension (TACL)
SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis
DuReaderrobust: A Chinese Dataset Towards Evaluating the Robustness of Machine Reading Comprehension Models
Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension (EMNLP2019)
Few-Shot Question Answering by Pretraining Span Selection (ACL2021)
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue [website]
A Short Survey of Pre-trained Language Models for Conversational AI-A NewAge in NLP
MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding (ACL2021)
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer (Interspeech2019)
Dialog State Tracking: A Neural Reading Comprehension Approach
A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems (ICASSP2020)
Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking
Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker
Dialogue State Tracking with Pretrained Encoder for Multi-domain Trask-oriented Dialogue Systems
Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking (ACL2020)
A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset (KDD2020 WS)
Knowledge-Aware Graph-Enhanced GPT-2 for Dialogue State Tracking
Coreference Augmentation for Multi-Domain Task-Oriented Dialogue State Tracking (Interspeech2021)
ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues (EMNLP2020)
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances (ACL2021)
Domain Adaptive Training BERT for Response Selection
Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots
Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking (ECIR2020)
MuTual: A Dataset for Multi-Turn Dialogue Reasoning (ACL2020)
DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement
Generalized Conditioned Dialogue Generation Based on Pre-trained Language Model
BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data (ACL2021)
Interactive Teaching for Conversational AI (NeurIPS2020 WS)
BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding
Slot filling and Intent Detection
A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding (EMNLP2019)
BERT for Joint Intent Classification and Slot Filling
A Co-Interactive Transformer for Joint Slot Filling and Intent Detection (ICASSP2021)
Few-shot Intent Classification and Slot Filling with Retrieved Examples (NAACL2021)
Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
A Comparison of Deep Learning Methods for Language Understanding (Interspeech2019)
Data Augmentation for Spoken Language Understanding via Pretrained Models
[Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning] (EMNLP2021)
STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++ (AACL-IJCNLP2020) [github]
Analysis
Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention
Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision (ACL2019)
BERT-based Lexical Substitution (ACL2019)
Assessing BERT’s Syntactic Abilities
Investigating Novel Verb Learning in BERT: Selectional Preference Classes and Alternation-Based Syntactic Generalization (EMNLP2020 WS)
Does BERT agree? Evaluating knowledge of structure dependence through agreement relations
Simple BERT Models for Relation Extraction and Semantic Role Labeling
Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach (COLING2020)
LIMIT-BERT : Linguistic Informed Multi-Task BERT (EMNLP2020 Findings)
Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards
A Simple BERT-Based Approach for Lexical Simplification
BERT-Based Arabic Social Media Author Profiling
Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media
Evaluating the Factual Consistency of Abstractive Text Summarization
Generating Fact Checking Explanations (ACL2020)
NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution
xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation
TabFact: A Large-scale Dataset for Table-based Fact Verification (ICLR2020)
Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents
A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks
LAMBERT: Layout-Aware (Language) Modeling for information extraction (ICDAR2021)
Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings (ECIR2020) [github]
Keyphrase Extraction with Span-based Feature Representations
Keyphrase Prediction With Pre-trained Language Model
Self-Supervised Contextual Keyword and Keyphrase Retrieval with Self-Labelling [github]
Joint Keyphrase Chunking and Salience Ranking with BERT
Generalizing Natural Language Analysis through Span-relation Representations (ACL2020) [github]
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection (ACL2020)
Domain Adaptation with BERT-based Domain Classification and Data Selection (EMNLP2019 WS)
PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models (TACL2020)
Unsupervised Out-of-Domain Detection via Pre-trained Transformers (ACL2021) [github]
Knowledge Distillation for BERT Unsupervised Domain Adaptation
Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERT (LREC2020)
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? (NAACL2021)
On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification (IJCAI2020)
Adapting BERT to Implicit Discourse Relation Classification with a Focus on Discourse Connectives (LREC2020)
Labeling Explicit Discourse Relations using Pre-trained Language Models (TSD2020)
Causal-BERT : Language models for causality detection between events expressed in text
BERT4SO: Neural Sentence Ordering by Fine-tuning BERT
Document-Level Event Argument Extraction by Conditional Generation (NAACL2021)
Cross-lingual Zero- and Few-shot Hate Speech Detection Utilising Frozen Transformer Language Models and AXEL
Same Side Stance Classification Task: Facilitating Argument Stance Classification by Fine-tuning a BERT Model
Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection
KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT
ALBERT-BiLSTM for Sequential Metaphor Detection (ACL2020 WS)
MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL2021)
A BERT-based Dual Embedding Model for Chinese Idiom Prediction (COLING2020)
Should You Fine-Tune BERT for Automated Essay Scoring? (ACL2020 WS)
KILT: a Benchmark for Knowledge Intensive Language Tasks (NAACL2021) [github]
IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding (AACL-IJCNLP2020)
MedFilter: Improving Extraction of Task-relevant Utterances through Integration of Discourse Structure and Ontological Knowledge (EMNLP2020)
ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces (AAAI2021)
UserBERT: Self-supervised User Representation Learning
UserBERT: Contrastive User Model Pre-training
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning (COLING2020)
Automatic punctuation restoration with BERT models
Word segmentation, parsing, NER
BERT Meets Chinese Word Segmentation
Unified Multi-Criteria Chinese Word Segmentation with BERT
RethinkCWS: Is Chinese Word Segmentation a Solved Task? (EMNLP2020) [github]
Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability (ACL2021 Findings)
Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT (FLAIRS-33)
Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing
fastHan: A BERT-based Joint Many-Task Toolkit for Chinese NLP
Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited (EMNLP2019)
Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?
Parsing as Pretraining (AAAI2020)
Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement
StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
pyBART: Evidence-based Syntactic Transformations for IE [github]
Named Entity Recognition -- Is there a glass ceiling? (CoNLL2019)
A Unified MRC Framework for Named Entity Recognition
Biomedical named entity recognition using BERT in the machine reading comprehension framework
Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
Robust Named Entity Recognition with Truecasing Pretraining (AAAI2020)
LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition
Named Entity Recognition as Dependency Parsing (ACL2020)
Exploring Cross-sentence Contexts for Named Entity Recognition with BERT
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI2021) [github]
Embeddings of Label Components for Sequence Labeling: A Case Study of Fine-grained Named Entity Recognition (ACL2020 SRW)
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision (KDD2020) [github]
Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve
Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language (ACL2020)
To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging (EMNLP2020)
Example-Based Named Entity Recognition
FLERT: Document-Level Features for Named Entity Recognition
Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition
What's in a Name? Are BERT Named Entity Representations just as Good for any other Name? (ACL2020 WS)
Interpretable Multi-dataset Evaluation for Named Entity Recognition (EMNLP2020) [github]
Entity Enhanced BERT Pre-training for Chinese NER (EMNLP2020)
Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter (ACL2021)
FLAT: Chinese NER Using Flat-Lattice Transformer (ACL2020)
BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition
MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers
Knowledge Guided Named Entity Recognition for BioMedical Text
Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New Approach Using XLM-RoBERTa Alignment
Portuguese Named Entity Recognition using BERT-CRF
Towards Lingua Franca Named Entity Recognition with BERT
Larger-Context Tagging: When and Why Does It Work? (NAACL2021)
Pronoun/coreference resolution
A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution
Resolving Gendered Ambiguous Pronouns with BERT (ACL2019 WS)
Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge (ACL2019 WS)
Gendered Pronoun Resolution using BERT and an extractive question answering formulation (ACL2019 WS)
MSnet: A BERT-based Network for Gendered Pronoun Resolution (ACL2019 WS)
Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation
Fill the GAP: Exploiting BERT for Pronoun Resolution (ACL2019 WS)
On GAP Coreference Resolution Shared Task: Insights from the 3rd Place Solution (ACL2019 WS)
Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution (ACL2019 WS)
Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction (ACL2021)
BERT Masked Language Modeling for Co-reference Resolution (ACL2019 WS)
Coreference Resolution with Entity Equalization (ACL2019)
BERT for Coreference Resolution: Baselines and Analysis (EMNLP2019) [github]
WikiCREM: A Large Unsupervised Corpus for Coreference Resolution (EMNLP2019)
CD2CR: Co-reference Resolution Across Documents and Domains (EACL2021)
Ellipsis Resolution as Question Answering: An Evaluation (EACL2021)
Coreference Resolution as Query-based Span Prediction
Coreferential Reasoning Learning for Language Representation (EMNLP2020)
Revisiting Memory-Efficient Incremental Coreference Resolution
Revealing the Myth of Higher-Order Inference in Coreference Resolution (EMNLP2020)
Coreference Resolution without Span Representations (ACL2021)
Neural Mention Detection (LREC2020)
ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT (ACL2020)
An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution (COLING2020)
BERT-based Cohesion Analysis of Japanese Texts (COLING2020)
Joint Coreference Resolution and Character Linking for Multiparty Conversation
Sequence to Sequence Coreference Resolution (COLING2020 WS)
Within-Document Event Coreference with BERT-Based Contextualized Representations
Multi-task Learning Based Neural Bridging Reference Resolution
Bridging Anaphora Resolution as Question Answering (ACL2020)
Fine-grained Information Status Classification Using Discourse Context-Aware BERT (COLING2020)
Word sense disambiguation
Language Models and Word Sense Disambiguation: An Overview and Analysis
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (EMNLP2019)
Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences (EMNLP2020 Findings)
Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations (EMNLP2019)
Using BERT for Word Sense Disambiguation
Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation (ACL2019)
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings (KONVENS2019)
An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on Bert
PolyLM: Learning about Polysemy through Language Modeling (EACL2021)
CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages (ACL2020)
Cross-lingual Word Sense Disambiguation using mBERT Embeddings with Syntactic Dependencies
VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word Representations for Improved Definition Modeling (EMNLP2020)
Sentiment analysis
Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL2019)
BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis (NAACL2019)
Exploiting BERT for End-to-End Aspect-based Sentiment Analysis (EMNLP2019 WS)
Improving BERT Performance for Aspect-Based Sentiment Analysis
Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis
Understanding Pre-trained BERT for Aspect-based Sentiment Analysis (COLING2020)
Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa (NAACL2021)
Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification (LREC2020)
An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese (ACL2019)
"Mask and Infill" : Applying Masked Language Model to Sentiment Transfer
Adversarial Training for Aspect-Based Sentiment Analysis with BERT
Adversarial and Domain-Aware BERT for Cross-Domain Sentiment Analysis (ACL2020)
Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference
DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis
YASO: A New Benchmark for Targeted Sentiment Analysis
SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL2020)
Relation extraction
Matching the Blanks: Distributional Similarity for Relation Learning (ACL2019)
BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction (NLPCC2019)
Enriching Pre-trained Language Model with Entity Information for Relation Classification
Span-based Joint Entity and Relation Extraction with Transformer Pre-training
Fine-tune Bert for DocRED with Two-step Process
Relation Extraction as Two-way Span-Prediction
Entity, Relation, and Event Extraction with Contextualized Span Representations (EMNLP2019)
Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text
Downstream Model Design of Pre-trained Language Model for Relation Extraction Task
Efficient long-distance relation extraction with DG-SpanBERT
Global-to-Local Neural Networks for Document-Level Relation Extraction (EMNLP2020)
DARE: Data Augmented Relation Extraction with GPT-2
Distantly-Supervised Neural Relation Extraction with Side Information using BERT (IJCNN2020)
Improving Distantly-Supervised Relation Extraction through BERT-based Label & Instance Embeddings
An End-to-end Model for Entity-level Relation Extraction using Multi-instance Learning (EACL2021)
ZS-BERT: Towards Zero-Shot Relation Extraction with Attribute Representation Learning (NAACL2021) [github]
AdaPrompt: Adaptive Prompt-based Finetuning for Relation Extraction
Dialogue-Based Relation Extraction (ACL2020)
An Embarrassingly Simple Model for Dialogue Relation Extraction
A Novel Cascade Binary Tagging Framework for Relational Triple Extraction (ACL2020) [github]
ExpBERT: Representation Engineering with Natural Language Explanations (ACL2020) [github]
AutoRC: Improving BERT Based Relation Classification Models via Architecture Search
Investigation of BERT Model on Biomedical Relation Extraction Based on Revised Fine-tuning Mechanism
Experiments on transfer learning architectures for biomedical relation extraction
Improving BERT Model Using Contrastive Learning for Biomedical Relation Extraction (BioNLP2021)
Cross-Lingual Relation Extraction with Transformers
Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification
Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction
A BERT-based One-Pass Multi-Task Model for Clinical Temporal Relation Extraction (ACL2020 WS)
Exploring Contextualized Neural Language Models for Temporal Dependency Parsing
Temporal Reasoning on Implicit Events from Distant Supervision
IMoJIE: Iterative Memory-Based Joint Open Information Extraction (ACL2020)
OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction (EMNLP2020) [github]
Multi2OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT (EMNLP2020 Findings)
Knowledge base
KG-BERT: BERT for Knowledge Graph Completion
How Context Affects Language Models' Factual Predictions (AKBC2020)
Inducing Relational Knowledge from BERT (AAAI2020)
Latent Relation Language Models (AAAI2020)
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model (ICLR2020)
Scalable Zero-shot Entity Linking with Dense Entity Retrieval (EMNLP2020) [github]
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling (EMNLP2020 Findings)
Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNLL2019)
Improving Entity Linking by Modeling Latent Entity Type Information (AAAI2020)
Global Entity Disambiguation with Pretrained Contextualized Embeddings of Words and Entities
YELM: End-to-End Contextualized Entity Linking
Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking (AKBC2020)
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention (EMNLP2020) [github]
Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas
CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata (EACL2021)
PEL-BERT: A Joint Model for Protocol Entity Linking
End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
Efficient One-Pass End-to-End Entity Linking for Questions (EMNLP2020) [github]
Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
Entity Linking in 100 Languages (EMNLP2020) [github]
COMETA: A Corpus for Medical Entity Linking in the Social Media (EMNLP2020) [github]
How Can We Know What Language Models Know? (TACL2020) [github]
How to Query Language Models?
Deep Entity Matching with Pre-Trained Language Models
Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model (ACL2021)
Constructing Taxonomies from Pretrained Language Models (NAACL2021)
Language Models are Open Knowledge Graphs
Can Generative Pre-trained Language Models Serve as Knowledge Bases for Closed-book QA? (ACL2021)
DualTKB: A Dual Learning Bridge between Text and Knowledge Base (EMNLP2020) [github]
Zero-shot Slot Filling with DPR and RAG
How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds [github]
MLMLM: Link Prediction with Mean Likelihood Masked Language Model
Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases
Text classification
Deep Learning Based Text Classification: A Comprehensive Review
A Text Classification Survey: From Shallow to Deep Learning
How to Fine-Tune BERT for Text Classification?
X-BERT: eXtreme Multi-label Text Classification with BERT
An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels (EMNLP2020)
Taming Pretrained Transformers for Extreme Multi-label Text Classification (KDD2020)
Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations (EMNLP2020 WS)
DocBERT: BERT for Document Classification
Enriching BERT with Knowledge Graph Embeddings for Document Classification
Classification and Clustering of Arguments with Contextualized Word Embeddings (ACL2019)
BERT for Evidence Retrieval and Claim Verification
Stacked DeBERT: All Attention in Incomplete Data for Text Classification
Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data
BAE: BERT-based Adversarial Examples for Text Classification (EMNLP2020)
FireBERT: Hardening BERT-based classifiers against adversarial attack [github]
GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples (ACL2020)
Description Based Text Classification with Reinforcement Learning
VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification
Zero-shot Text Classification via Reinforced Self-training (ACL2020)
On Data Augmentation for Extreme Multi-label Classification
Noisy Channel Language Model Prompting for Few-Shot Text Classification
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning (NAACL2021)
Towards Evaluating the Robustness of Chinese BERT Classifiers
COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter [github]
Large Scale Legal Text Classification Using Transformer Models
BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification (NAACL2021)
A Comparison of LSTM and BERT for Small Corpus
WSC, WNLI, NLI
Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge
A Surprisingly Robust Trick for the Winograd Schema Challenge
WinoGrande: An Adversarial Winograd Schema Challenge at Scale (AAAI2020)
TTTTTackling WinoGrande Schemas
WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge (ACL2020)
The Sensitivity of Language Models and Humans to Winograd Schema Perturbations (ACL2020)
Precise Task Formalization Matters in Winograd Schema Evaluations (EMNLP2020)
Tackling Domain-Specific Winograd Schemas with Knowledge-Based Reasoning and Machine Learning
A Review of Winograd Schema Challenge Datasets and Approaches
Improving Natural Language Inference with a Pretrained Parser
Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition
DocNLI: A Large-scale Dataset for Document-level Natural Language Inference (ACL2021 Findings)
Adversarial NLI: A New Benchmark for Natural Language Understanding
Adversarial Analysis of Natural Language Inference Systems (ICSC2020)
ANLIzing the Adversarial Natural Language Inference Dataset
Syntactic Data Augmentation Increases Robustness to Inference Heuristics (ACL2020)
Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets (EMNLP2020 WS) [github]
HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference (LREC2020)
Use of Machine Translation to Obtain Labeled Datasets for Resource-Constrained Languages (EMNLP2020) [github]
FarsTail: A Persian Natural Language Inference Dataset
Evaluating BERT for natural language inference: A case study on the CommitmentBank (EMNLP2019)
Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language? (ACL2020)
Abductive Commonsense Reasoning (ICLR2020)
Entailment as Few-Shot Learner
Collecting Entailment Data for Pretraining: New Protocols and Negative Results
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation (EMNLP2022 Findings) [github]
Mining Knowledge for Natural Language Inference from Wikipedia Categories (EMNLP2020 Findings)
Commonsense
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (NAACL2019)
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
HellaSwag: Can a Machine Really Finish Your Sentence? (ACL2019) [website]
A Method for Building a Commonsense Inference Dataset Based on Basic Events (EMNLP2020) [website]
Story Ending Prediction by Transferable BERT (IJCAI2019)
Explain Yourself! Leveraging Language Models for Commonsense Reasoning (ACL2019)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning (ACL2020)
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Informing Unsupervised Pretraining with External Linguistic Knowledge
Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test
BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge
Commonsense Knowledge Mining from Pretrained Models (EMNLP2019)
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning (EMNLP2019)
Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
Do Massively Pretrained Language Models Make Better Storytellers? (CoNLL2019)
PIQA: Reasoning about Physical Commonsense in Natural Language (AAAI2020)
Evaluating Commonsense in Pre-trained Language Models (AAAI2020)
Why Do Masked Neural Language Models Still Need Common Sense Knowledge?
Does BERT Solve Commonsense Task via Commonsense Knowledge?
Unsupervised Commonsense Question Answering with Self-Talk (EMNLP2020)
Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering (AAAI2021)
G-DAUG: Generative Data Augmentation for Commonsense Reasoning
Contrastive Self-Supervised Learning for Commonsense Reasoning (ACL2020)
Differentiable Open-Ended Commonsense Reasoning
Adversarial Training for Commonsense Inference (ACL2020 WS)
Do Fine-tuned Commonsense Language Models Really Generalize?
Do Language Models Perform Generalizable Commonsense Inference? (ACL2021 Findings)
Improving Zero Shot Learning Baselines with Commonsense Knowledge
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [github]
Do Neural Language Representations Learn Physical Commonsense? (CogSci2019)
Extractive summarization
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization (ACL2019)
Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression
Discourse-Aware Neural Extractive Text Summarization (ACL2020) [github]
AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization
Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT (COLING2020)
Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help ! (EMNLP2020 WS)
Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations (EMNLP2019 WS)
Continual BERT: Continual Learning for Adaptive Extractive Summarization of COVID-19 Literature
Grammatical error correction
Multi-headed Architecture Based on BERT for Grammatical Errors Correction (ACL2019 WS)
Towards Minimal Supervision BERT-based Grammar Error Correction
Learning to combine Grammatical Error Corrections (EMNLP2019 WS)
LM-Critic: Language Models for Unsupervised Grammatical Error Correction (EMNLP2021) [github]
Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction (ACL2020)
Chinese Grammatical Correction Using BERT-based Pre-trained Model (AACL-IJCNLP2020)
Spelling Error Correction with Soft-Masked BERT (ACL2020)
IR
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models [github]
Pretrained Transformers for Text Ranking: BERT and Beyond
Passage Re-ranking with BERT
Investigating the Successes and Failures of BERT for Passage Re-Ranking
Understanding the Behaviors of BERT in Ranking
Document Expansion by Query Prediction
Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval (ACL2021)
CEDR: Contextualized Embeddings for Document Ranking (SIGIR2019)
Deeper Text Understanding for IR with Contextual Neural Language Modeling (SIGIR2019)
FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance (SIGIR2019)
An Analysis of BERT FAQ Retrieval Models for COVID-19 Infobot
COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval
Unsupervised FAQ Retrieval with Question Generation and BERT (ACL2020)
Multi-Stage Document Ranking with BERT
Learning-to-Rank with BERT in TF-Ranking
Transformer-Based Language Models for Similar Text Retrieval and Ranking
DeText: A Deep Text Ranking Framework with BERT
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT (SIGIR2020)
RepBERT: Contextualized Text Embeddings for First-Stage Retrieval [github]
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
Multi-Perspective Semantic Information Retrieval
CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos (SIGIR2022)
Expansion via Prediction of Importance with Contextualization (SIGIR2020)
BERT-QE: Contextualized Query Expansion for Document Re-ranking (EMNLP2020 Findings)
Beyond [CLS] through Ranking by Generation (EMNLP2020)
Efficient Document Re-Ranking for Transformers by Precomputing Term Representations (SIGIR2020)
Training Curricula for Open Domain Answer Re-Ranking (SIGIR2020)
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling
Boosted Dense Retriever
ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval
Document Ranking with a Pretrained Sequence-to-Sequence Model
A Neural Corpus Indexer for Document Retrieval
COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List (NAACL2021)
Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search (SIGIR2020)
Fine-tune BERT for E-commerce Non-Default Search Ranking
IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles
ProphetNet-Ads: A Looking Ahead Strategy for Generative Retrieval Models in Sponsored Search Engine (NLPCC2020)
Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned (ACL2020 WS)
SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search (EMNLP2020)
Neural Duplicate Question Detection without Labeled Training Data (EMNLP2019)
Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection
Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs
Cross-lingual Information Retrieval with BERT
Cross-lingual Retrieval for Iterative Self-Supervised Training (NeurIPS2020)
Graph-based Multilingual Product Retrieval in E-Commerce Search (NAACL2021 Industry)
Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning (ECIR2020)
PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval (WSDM2021)
B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval (SIGIR2021)
Condenser: a Pre-training Architecture for Dense Retrieval (EMNLP2021)
Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation (ACL2022)
Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval (EMNLP2021 WS) [github]
Generation
Pretrained Language Models for Text Generation: A Survey (IJCAI2021 Survey Track)
A Survey of Pretrained Language Models Based Text Generation
GLGE: A New General Language Generation Evaluation Benchmark [github]
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model (NAACL2019 WS)
Pretraining-Based Natural Language Generation for Text Summarization
Text Summarization with Pretrained Encoders (EMNLP2019) [github (original)] [github (huggingface)]
Multi-stage Pretraining for Abstractive Summarization
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models
GSum: A General Framework for Guided Neural Abstractive Summarization (NAACL2021) [github]
STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization
TLDR: Extreme Summarization of Scientific Documents [github]
Product Title Generation for Conversational Systems using BERT
WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization (COLING2020)
Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation
Abstractive Query Focused Summarization with Query-Free Resources
Abstractive Summarization of Spoken and Written Instructions with BERT
Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization (ACL2021)
Coreference-Aware Dialogue Summarization (SIGDIAL2021)
XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages (ACL2021 Findings) [github]
BERT Fine-tuning For Arabic Text Summarization (ICLR2020 WS)
Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2
Mixed-Lingual Pre-training for Cross-lingual Summarization (AACL-IJCNLP2020)
PoinT-5: Pointer Network and T-5 based Financial NarrativeSummarisation (COLING2020 WS)
MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML2019) [github], [github]
JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation (LREC2020)
Unified Language Model Pre-training for Natural Language Understanding and Generation [github] (NeurIPS2019)
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [github]
Dual Inference for Improving Language Understanding and Generation (EMNLP2020 Findings)
All NLP Tasks Are Generation Tasks: A General Pretraining Framework
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training (EMNLP2020 Findings) [github]
ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation
Towards Making the Most of BERT in Neural Machine Translation
Improving Neural Machine Translation with Pre-trained Representation
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation (EMNLP2021)
On the use of BERT for Neural Machine Translation (EMNLP2019 WS)
Incorporating BERT into Neural Machine Translation (ICLR2020)
Recycling a Pre-trained BERT Encoder for Neural Machine Translation
Exploring Unsupervised Pretraining Objectives for Machine Translation (ACL2021 Findings)
Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT (EMNLP2020)
Language Models are Good Translators
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Mask-Predict: Parallel Decoding of Conditional Masked Language Models (EMNLP2019)
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation (EMNLP2020)
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
Non-Autoregressive Text Generation with Pre-trained Language Models (EACL2021)
Cross-Lingual Natural Language Generation via Pre-Training (AAAI2020) [github]
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable (ACL2020)
A Tailored Pre-Training Model for Task-Oriented Dialog Generation
Pretrained Language Models for Dialogue Generation with Multiple Input Sources (EMNLP2020 Findings)
Knowledge-Grounded Dialogue Generation with Pre-trained Language Models (EMNLP2020)
Are Pre-trained Language Models Knowledgeable to Ground Open Domain Dialogues?
Open-Domain Dialogue Generation Based on Pre-trained Language Models
LaMDA: Language Models for Dialog Applications
Retrieval-Augmented Transformer-XL for Close-Domain Dialog Generation
Internet-Augmented Dialogue Generation
DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (AAAI2021)
CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection
QURIOUS: Question Generation Pretraining for Text Generation
Few-Shot NLG with Pre-Trained Language Model (ACL2020)
Text-to-Text Pre-Training for Data-to-Text Tasks
KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation (EMNLP2020)
Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference (INLG2020)
Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
Structure-Grounded Pretraining for Text-to-SQL
Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
ToTTo: A Controlled Table-To-Text Generation Dataset (EMNLP2020) [github]
Exploring Fluent Query Reformulations with Text-to-Text Transformers and Reinforcement Learning (AAAI2021 WS)
A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation (TACL2020) [github]
MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models (EMNLP2020)
Facts2Story: Controlling Text Generation by Key Facts
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning [github] [website] (EMNLP2020 Findings)
An Enhanced Knowledge Injection Model for Commonsense Generation (COLING2020)
Retrieval Enhanced Model for Commonsense Generation (ACL2021 Findings)
Lexically-constrained Text Generation through Commonsense Knowledge Extraction and Injection (AAAI2021WS)
Pre-training Text-to-Text Transformers for Concept-centric Common Sense
Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph (EMNLP2020)
KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning
Autoregressive Entity Retrieval (ICLR2021) [github]
Multilingual Autoregressive Entity Linking
EIGEN: Event Influence GENeration using Pre-trained Language Models
proScript: Partially Ordered Scripts Generation via Pre-trained Language Models
Goal-Oriented Script Construction (INLG2021)
Contrastive Triple Extraction with Generative Transformer (AAAI2021)
GeDi: Generative Discriminator Guided Sequence Generation
Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation (EMNLP2020)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (JMLR2020) [github]
mT5: A massively multilingual pre-trained text-to-text transformer (NAACL2021) [github]
nmT5 -- Is parallel data still relevant for pre-training massively multilingual language models? (ACL2021)
mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
WT5?! Training Text-to-Text Models to Explain their Predictions
NT5?! Training T5 to Perform Numerical Reasoning [github]
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (ACL2020)
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Finetuned Language Models Are Zero-Shot Learners [blog]
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multilingual Denoising Pre-training for Neural Machine Translation
Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data (COLING2020)
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Unsupervised Pre-training for Natural Language Generation: A Literature Review
Quality evaluator
BERTScore: Evaluating Text Generation with BERT (ICLR2020)
BERTTune: Fine-Tuning Neural Machine Translation with BERTScore (ACL2021)
Machine Translation Evaluation with BERT Regressor
TransQuest: Translation Quality Estimation with Cross-lingual Transformers (COLING2020)
SumQE: a BERT-based Summary Quality Estimation Model (EMNLP2019)
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance (EMNLP2019) [github]
BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward
Language Model Augmented Relevance Score (ACL2021)
BLEURT: Learning Robust Metrics for Text Generation (ACL2020)
BARTScore: Evaluating Generated Text as Text Generation [github]
Masked Language Model Scoring (ACL2020)
Simple-QE: Better Automatic Quality Estimation for Text Simplification
Modification (multi-task, masking strategy, etc.)
Multi-Task Deep Neural Networks for Natural Language Understanding (ACL2019)
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (ICML2019)
Measuring Massive Multitask Language Understanding (ICLR2021) [github]
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks (ACL2021)
Pre-training Text Representations as Meta Learning
Unifying Question Answering and Text Classification via Span Extraction
MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization (ACL2020)
ERNIE: Enhanced Language Representation with Informative Entities (ACL2019)
ERNIE: Enhanced Representation through Knowledge Integration
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (AAAI2020)
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding (NeurIPS2019) [github]
MPNet: Masked and Permuted Pre-training for Language Understanding
Pre-Training with Whole Word Masking for Chinese BERT
SpanBERT: Improving Pre-training by Representing and Predicting Spans (TACL2020) [github]
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling (EMNLP2021) [github]
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning (NAACL2022)
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations (EMNLP2020 Findings)
ZEN 2.0: Continue Training and Adaption for N-gram Enhanced Text Encoders
MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining
Adversarial Training for Large Neural Language Models
BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks (ACL2021)
Train No Evil: Selective Masking for Task-guided Pre-training
Position Masking for Language Models
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models (EMNLP2020)
Variance-reduced Language Pretraining via a Mask Proposal Network
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation (EMNLP2020)
Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model
Contextual Representation Learning beyond Masked Language Modeling (ACL2022)
Curriculum learning for language modeling
Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference (EACL2021) [github]
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners (NAACL2021) [github]
Making Pre-trained Language Models Better Few-shot Learners (ACL2021) [github]
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Lifelong Learning of Few-shot Learners across NLP Tasks
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks (ACL2020)
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora (NAACL2022)
Towards Continual Knowledge Learning of Language Models (ICLR2022)
An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training [github]
To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks (ACL2020)
Revisiting Few-sample BERT Fine-tuning
Blank Language Models
Enabling Language Models to Fill in the Blanks (ACL2020)
Efficient Training of BERT by Progressively Stacking (ICML2019) [github]
RoBERTa: A Robustly Optimized BERT Pretraining Approach [github]
On Losses for Modern Language Models (EMNLP2020) [github]
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR2020)
Rethinking Embedding Coupling in Pre-trained Language Models (ICLR2021)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR2020) [github] [blog]
Training ELECTRA Augmented with Multi-word Selection (ACL2021 Findings)
Learning to Sample Replacements for ELECTRA Pre-Training (ACL2021 Findings)
SCRIPT: Self-Critic PreTraining of Transformers (NAACL2021)
Pre-Training Transformers as Energy-Based Cloze Models (EMNLP2020) [github]
MC-BERT: Efficient Language Pre-Training via a Meta Controller
FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR2020)
KERMIT: Generative Insertion-Based Modeling for Sequences
CALM: Continuous Adaptive Learning for Language Modeling
SegaBERT: Pre-training of Segment-aware BERT for Language Understanding
DisSent: Sentence Representation Learning from Explicit Discourse Relations (ACL2019)
Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models (ACL2020)
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
SLM: Learning a Discourse Language Representation with Sentence Unshuffling (EMNLP2020)
CausalBERT: Injecting Causal Knowledge Into Pre-trained Models with Minimal Supervision
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR2020)
Structural Pre-training for Dialogue Comprehension (ACL2021)
Retrofitting Structure-aware Transformer Language Model for End Tasks (EMNLP2020)
Syntax-Enhanced Pre-trained Model
Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
Do Syntax Trees Help Pre-trained Transformers Extract Information?
SenseBERT: Driving Some Sense into BERT
Semantics-aware BERT for Language Understanding (AAAI2020)
GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method
K-BERT: Enabling Language Representation with Knowledge Graph
Knowledge Enhanced Contextual Word Representations (EMNLP2019)
Knowledge-Aware Language Model Pretraining
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
JAKET: Joint Pre-training of Knowledge Graph and Language Understanding
E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT (EMNLP2020)
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
Entities as Experts: Sparse Memory Access with Entity Supervision (EMNLP2020)
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning (EMNLP2020)
Contextualized Representations Using Textual Encyclopedic Knowledge
CoLAKE: Contextualized Language and Knowledge Embedding (COLING2020)
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding
K-XLNet: A General Method for Combining Explicit Knowledge with Language Model Pretraining
Combining pre-trained language models and structured knowledge
Coarse-to-Fine Pre-training for Named Entity Recognition (EMNLP2020)
E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks (COLING2020 WS)
REALM: Retrieval-Augmented Language Model Pre-Training (ICML2020) [github]
Simple and Efficient ways to Improve REALM
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (NeurIPS2020)
Fine-tune the Entire RAG Architecture (including DPR retriever) for Question-Answering
Joint Retrieval and Generation Training for Grounded Text Generation
Retrieval Augmentation Reduces Hallucination in Conversation
On-The-Fly Information Retrieval Augmentation for Language Models
Current Limitations of Language Models: What You Need is Retrieval
Improving language models by retrieving from trillions of tokens [blog] [blog]
Taking Notes on the Fly Helps BERT Pre-training
Pre-training via Paraphrasing
SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis (ACL2020)
Improving Event Duration Prediction via Time-aware Pre-training (EMNLP2020 Findings)
Knowledge-Aware Procedural Text Understanding with Multi-Stage Training
Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (ICLR2020)
Rethinking Positional Encoding in Language Pre-training
Improve Transformer Models with Better Relative Position Embeddings (EMNLP2020 Findings)
RoFormer: Enhanced Transformer with Rotary Position Embedding
Position Information in Transformers: An Overview
BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks
BURT: BERT-inspired Universal Representation from Twin Structure
Universal Text Representation from BERT: An Empirical Study
Symmetric Regularization based BERT for Pair-wise Semantic Reasoning (SIGIR2020)
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling (ACL2021)
Transfer Fine-Tuning: A BERT Case Study (EMNLP2019)
Improving Pre-Trained Multilingual Models with Vocabulary Expansion (CoNLL2019)
BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance (ACL2020)
A Mixture of h−1 Heads is Better than h Heads (ACL2020)
SesameBERT: Attention for Anywhere
Multi-Head Attention: Collaborate Instead of Concatenate
DeBERTa: Decoding-enhanced BERT with Disentangled Attention [github]
Deepening Hidden Representations from Pre-trained Language Models
On the Transformer Growth for Progressive BERT Training
Improving BERT with Self-Supervised Attention
Guiding Attention for Self-Supervised Learning with Transformers (EMNLP2020 Findings)
Improving Disfluency Detection by Self-Training a Self-Attentive Model
Self-training Improves Pre-training for Natural Language Understanding [github]
CERT: Contrastive Self-supervised Learning for Language Understanding
Robust Transfer Learning with Pretrained Language Models through Adapters (ACL2021)
ReadOnce Transformers: Reusable Representations of Text for Transformers (ACL2021)
LV-BERT: Exploiting Layer Variety for BERT (ACL2021 Findings) [github]
Large Product Key Memory for Pretrained Language Models (EMNLP2020 Findings)
Enhancing Pre-trained Language Model with Lexical Simplification
Contextual BERT: Conditioning the Language Model Using a Global State (COLING2020 WS)
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization (ACL2020)
Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning (EMNLP2021) [github]
Token Dropping for Efficient BERT Pretraining (ACL2022) [github]
Pay Attention to MLPs
Are Pre-trained Convolutions Better than Pre-trained Transformers? (ACL2021)
Pre-Training a Language Model Without Human Language
Tokenization
Training Multilingual Pre-trained Language Model with Byte-level Subwords
Byte Pair Encoding is Suboptimal for Language Model Pretraining (EMNLP2020 Findings)
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation (TACL2022) [github]
ByT5: Towards a token-free future with pre-trained byte-to-byte models (TACL2022) [github]
Multi-view Subword Regularization (NAACL2021)
Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation (ACL2021)
An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks (AACL-IJCNLP2020)
AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization (ACL2021 Findings)
Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models (NAACL2021)
CharBERT: Character-aware Pre-trained Language Model (COLING2020) [github]
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters (COLING2020)
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization [github]
Fast WordPiece Tokenization (EMNLP2021)
MaxMatch-Dropout: Subword Regularization for WordPiece (COLING2022)
Prompt
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts (EMNLP2020) [github]
Calibrate Before Use: Improving Few-Shot Performance of Language Models
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
GPT Understands, Too [github]
How Many Data Points is a Prompt Worth? (NAACL2021) [website]
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts (NAACL2021)
Meta-tuning Language Models to Answer Prompts Better
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
The Power of Scale for Parameter-Efficient Prompt Tuning (EMNLP2021)
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
PPT: Pre-trained Prompt Tuning for Few-shot Learning
True Few-Shot Learning with Language Models
Few-shot Sequence Learning with Transformers (NeurIPS2020 WS)
PTR: Prompt Tuning with Rules for Text Classification
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification
Discrete and Soft Prompting for Multilingual Models (EMNLP2021)
Reframing Instructional Prompts to GPTk's Language
Multimodal Few-Shot Learning with Frozen Language Models
FLEX: Unifying Evaluation for Few-Shot NLP
Do Prompt-Based Models Really Understand the Meaning of their Prompts?
OpenPrompt: An Open-source Framework for Prompt-learning (ACL2022 Demo)
Sentence embedding
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP2019)
Parameter-free Sentence Embedding via Orthogonal Basis (EMNLP2019)
SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models
On the Sentence Embeddings from Pre-trained Language Models (EMNLP2020)
Semantic Re-tuning with Contrastive Tension (ICLR2021)
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations (ACL2021)
ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer (ACL2021)
CLEAR: Contrastive Learning for Sentence Representation
SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP2021) [github]
ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders (EMNLP2021) [github]
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning (EMNLP2021 Findings)
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations [github]
Whitening Sentence Representations for Better Semantics and Faster Retrieval [github]
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks (NAACL2021)
DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings (NAACL2022) [code]
Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives (AAAI2022) [github]
Sentence Embeddings by Ensemble Distillation
EASE: Entity-Aware Contrastive Learning of Sentence Embedding (NAACL2022)
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Dual-View Distilled BERT for Sentence Embedding (SIGIR2021)
DefSent: Sentence Embeddings using Definition Sentences (ACL2021)
Paraphrastic Representations at Scale [github]
Learning Dense Representations of Phrases at Scale (ACL2021) [github]
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration (EMNLP2021)
Transformer variants
Efficient Transformers: A Survey
Adaptive Attention Span in Transformers (ACL2019)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL2019) [github]
Generating Long Sequences with Sparse Transformers
Do Transformers Need Deep Long-Range Memory (ACL2020)
DA-Transformer: Distance-aware Transformer (NAACL2021)
Adaptively Sparse Transformers (EMNLP2019)
Compressive Transformers for Long-Range Sequence Modelling
The Evolved Transformer (ICML2019)
Reformer: The Efficient Transformer (ICLR2020) [github]
GRET: Global Representation Enhanced Transformer (AAAI2020)
GMAT: Global Memory Augmentation for Transformers
Memory Transformer
Transformer on a Diet [github]
A Tensorized Transformer for Language Modeling (NeurIPS2019)
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling (ICLR2020) [github]
DeLighT: Very Deep and Light-weight Transformer [github]
Lite Transformer with Long-Short Range Attention [github] (ICLR2020)
Efficient Content-Based Sparse Attention with Routing Transformers
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Longformer: The Long-Document Transformer [github]
Big Bird: Transformers for Longer Sequences
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (AAAI2021)
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AAAI2021) [github]
Improving Transformer Models by Reordering their Sublayers (ACL2020)
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Mask Attention Networks: Rethinking and Strengthen Transformer (NAACL2021)
Synthesizer: Rethinking Self-Attention in Transformer Models
Query-Key Normalization for Transformers (EMNLP2020 Findings)
Rethinking Attention with Performers (ICLR2021)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing (ACL2020) [github]
Linformer: Self-Attention with Linear Complexity
What's Hidden in a One-layer Randomly Weighted Transformer? (EMNLP2021)
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Understanding the Difficulty of Training Transformers (EMNLP2020)
Towards Fully 8-bit Integer Inference for the Transformer Model (IJCAI2020)
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Long Range Arena: A Benchmark for Efficient Transformers
Probe
A Structural Probe for Finding Syntax in Word Representations (NAACL2019)
When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions
Finding Universal Grammatical Relations in Multilingual BERT (ACL2020)
Probing Multilingual BERT for Genetic and Typological Signals (COLING2020)
Linguistic Knowledge and Transferability of Contextual Representations (NAACL2019) [github]
Probing What Different NLP Tasks Teach Machines about Function Word Comprehension (*SEM2019)
BERT Rediscovers the Classical NLP Pipeline (ACL2019)
A Closer Look at How Fine-tuning Changes BERT (ACL2022)
Mediators in Determining what Processing BERT Performs First (NAACL2021)
Probing Neural Network Comprehension of Natural Language Arguments (ACL2019)
Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Quantity doesn't buy quality syntax with neural language models (EMNLP2019)
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction (ICLR2020)
Discourse Probing of Pretrained Language Models (NAACL2021)
oLMpics -- On what Language Model Pre-training Captures
Do Neural Language Models Show Preferences for Syntactic Formalisms? (ACL2020)
Probing for Predicate Argument Structures in Pretrained Language Models (ACL2022)
Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT (ACL2020)
Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work? (ACL2020)
Probing Linguistic Systematicity (ACL2020)
A Matter of Framing: The Impact of Linguistic Formalism on Probing Results
A Cross-Task Analysis of Text Span Representations (ACL2020 WS)
When Do You Need Billions of Words of Pretraining Data? [github]
Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis
Language Models as Knowledge Bases? (EMNLP2019) [github]
BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA
How Much Knowledge Can You Pack Into the Parameters of a Language Model? (EMNLP2020)
Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries (EACL2021)
Factual Probing Is [MASK]: Learning vs. Learning to Recall (NAACL2021) [github]
Knowledge Neurons in Pretrained Transformers
DirectProbe: Studying Representations without Classifiers (NAACL2021)
The Language Model Understood the Prompt was Ambiguous: Probing Syntactic Uncertainty Through Generation (EMNLP2021 WS)
X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models (EMNLP2020)
Probing BERT in Hyperbolic Spaces (ICLR2021)
Probing Across Time: What Does RoBERTa Know and When?
Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)
Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models [github] [website]
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly (ACL2020)
How is BERT surprised? Layerwise detection of linguistic anomalies (ACL2021)
Exploring the Role of BERT Token Representations to Explain Sentence Probing Results
What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge
A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models
Probing Task-Oriented Dialogue Representation from Language Models (EMNLP2020)
Probing for Bridging Inference in Transformer Language Models
BERTering RAMS: What and How Much does BERT Already Know About Event Arguments? -- A Study on the RAMS Dataset (EMNLP2020 WS)
CxGBERT: BERT meets Construction Grammar (COLING2020) [github]
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? (ACL2021)
Inside BERT
What does BERT learn about the structure of language? (ACL2019)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ACL2019) [github]
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)
Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)
What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)
Do Attention Heads in BERT Track Syntactic Dependencies?
Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)
Inducing Syntactic Trees from BERT Representations (ACL2019 WS)
A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)
Visualizing and Measuring the Geometry of BERT
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP2019)
Are Sixteen Heads Really Better than One? (NeurIPS2019)
On the Validity of Self-Attention as Explanation in Transformer Models
Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)
Attention Interpretability Across NLP Tasks
Revealing the Dark Secrets of BERT (EMNLP2019)
Analyzing Redundancy in Pretrained Transformer Models (EMNLP2020)
What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models
Attention Module is Not Only a Weight: Analyzing Transformers with Vector Norms (ACL2020 SRW)
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP2021)
Quantifying Attention Flow in Transformers
Telling BERT's full story: from Local Attention to Global Aggregation (EACL2021)
How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT′s Attention
Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks (ACL2021)
What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding (EMNLP2020)
Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)
Are Pretrained Language Models Symbolic Reasoners Over Knowledge? (CoNLL2020)
Rethinking the Value of Transformer Components (COLING2020)
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Investigating Transferability in Pretrained Language Models
What Happens To BERT Embeddings During Fine-tuning?
Analyzing Individual Neurons in Pre-trained Language Models (EMNLP2020)
How fine can fine-tuning be? Learning efficient language models (AISTATS2020)
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)
A Primer in BERTology: What we know about how BERT works (TACL2020)
Pretrained Language Model Embryology: The Birth of ALBERT (EMNLP2020) [github]
Evaluating Saliency Methods for Neural Language Models (NAACL2021)
Investigating Gender Bias in BERT
Measuring and Reducing Gendered Correlations in Pre-trained Models [website]
Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias (COLING2020 WS)
Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models (EACL2021)
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models (EMNLP2020)
Unmasking the Mask -- Evaluating Social Biases in Masked Language Models
BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking Scalar Adjectives with Contextualised Representations (EMNLP2020)
Does Chinese BERT Encode Word Structure? (COLING2020) [github]
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)
Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering
What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
What do Models Learn from Question Answering Datasets?
Towards Interpreting BERT for Reading Comprehension Based QA (EMNLP2020)
Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA (EMNLP2020)
How does BERT’s attention change when you fine-tune? An analysis methodology and a case study in negation scope (ACL2020)
Calibration of Pre-trained Transformers
When BERT Plays the Lottery, All Tickets Are Winning (EMNLP2020)
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
What Context Features Can Transformer Language Models Use? (ACL2021)
exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]
The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models [github]
What Does BERT with Vision Look At? (ACL2020)
[Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models]() (ECCV2020)
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers (TACL2021)
What Vision-Language Models ‘See’ when they See Scenes
Multi-lingual
A Primer on Pretrained Multilingual Language Models
Multilingual Constituency Parsing with Self-Attention and Pre-Training (ACL2019)
Cross-lingual Language Model Pretraining (NeurIPS2019) [github]
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
XLM-K: Improving Cross-Lingual Language Model Pre-Training with Multilingual Knowledge
75 Languages, 1 Model: Parsing Universal Dependencies Universally (EMNLP2019) [github]
Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations (EMNLP2019 WS)
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank (EMNLP2020 Findings)
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (EMNLP2019)
How multilingual is Multilingual BERT? (ACL2019)
How Language-Neutral is Multilingual BERT?
How to Adapt Your Pretrained Multilingual Model to 1600 Languages (ACL2021)
Load What You Need: Smaller Versions of Multilingual BERT (EMNLP2020) [github]
Is Multilingual BERT Fluent in Language Generation?
ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation (ACL2021 Findings)
Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks (EMNLP2019)
BERT is Not an Interlingua and the Bias of Tokenization (EMNLP2019 WS)
Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR2020)
Multilingual Alignment of Contextual Word Representations (ICLR2020)
Emerging Cross-lingual Structure in Pretrained Language Models (ACL2020)
On the Cross-lingual Transferability of Monolingual Representations
Unsupervised Cross-lingual Representation Learning at Scale (ACL2020)
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study (EMNLP2020 Findings)
Emerging Cross-lingual Structure in Pretrained Language Models
Can Monolingual Pretrained Models Help Cross-Lingual Classification?
A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT
Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data (CoNLL2019)
What the [MASK]? Making Sense of Language-Specific BERT Models
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization (ICML2020)
XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation (EMNLP2021)
XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages
Extending Multilingual BERT to Low-Resource Languages
Learning Better Universal Representations from Pre-trained Contextualized Language Models
Universal Dependencies according to BERT: both more specific and more general
A Call for More Rigor in Unsupervised Cross-lingual Learning (ACL2020)
Identifying Necessary Elements for BERT's Multilinguality (EMNLP2020)
MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer
From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers
Language Models are Few-shot Multilingual Learners
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT (EACL2021)
Multilingual BERT Post-Pretraining Alignment (NAACL2021)
XeroAlign: Zero-Shot Cross-lingual Transformer Alignment (ACL2021 Findings)
Syntax-augmented Multilingual BERT for Cross-lingual Transfer (ACL2021)
Language Representation in Multilingual BERT and its applications to improve Cross-lingual Generalization
VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation
On the Language Neutrality of Pre-trained Multilingual Representations
Are All Languages Created Equal in Multilingual BERT? (ACL2020 WS)
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
Adapting Monolingual Models: Data can be Scarce when Language Similarity is High (ACL2021 Findings)
Language-agnostic BERT Sentence Embedding
Universal Sentence Representation Learning with Conditional Masked Language Model
WikiBERT models: deep transfer learning for many languages
Inducing Language-Agnostic Multilingual Representations
To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding? (COLING2020)
It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT (EMNLP2020 WS)
XLM-T: A Multilingual Language Model Toolkit for Twitter
A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
Translation Artifacts in Cross-lingual Transfer Learning (EMNLP2020)
Identifying Cultural Differences through Multi-Lingual Wikipedia
A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT (EMNLP2020)
BERT for Monolingual and Cross-Lingual Reverse Dictionary (EMNLP2020 Findings)
Bilingual Text Extraction as Reading Comprehension
Evaluating Multilingual BERT for Estonian
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models (ACL2021) [github]
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training (EMNLP2021)
BERTologiCoMix: How does Code-Mixing interact with Multilingual BERT? (EACL2021 WS)
Other than English models
CamemBERT: a Tasty French Language Model (ACL2020)
On the importance of pre-training data volume for compact language models (EMNLP2020)
FlauBERT: Unsupervised Language Model Pre-training for French (LREC2020)
Multilingual is not enough: BERT for Finnish
BERTje: A Dutch BERT Model
RobBERT: a Dutch RoBERTa-based Language Model
Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark (EMNLP2020)
AraBERT: Transformer-based Model for Arabic Language Understanding
ALUE: Arabic Language Understanding Evaluation (EACL2021 WS) [website]
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic (ACL2021) [github]
Pre-Training BERT on Arabic Tweets: Practical Considerations
PhoBERT: Pre-trained language models for Vietnamese
Give your Text Representation Models some Love: the Case for Basque (LREC2020)
ParsBERT: Transformer-based Model for Persian Language Understanding
Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization (CSICC2021)
Pre-training Polish Transformer-based Language Models at Scale
Playing with Words at the National Library of Sweden -- Making a Swedish BERT
KR-BERT: A Small-Scale Korean-Specific Language Model
KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding (ICPR2020)
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers (EMNLP2021)
KLUE: Korean Language Understanding Evaluation
WangchanBERTa: Pretraining transformer-based Thai Language Models
FinEst BERT and CroSloEngual BERT: less is more in multilingual models (TSD2020)
GREEK-BERT: The Greeks visiting Sesame Street (SETN2020)
The birth of Romanian BERT (EMNLP2020 Findings)
German's Next Language Model (COLING2020 Industry Truck)
GottBERT: a pure German Language Model
EstBERT: A Pretrained Language-Specific BERT for Estonian
Czert -- Czech BERT-like Model for Language Representation
RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model (TSD2021)
Bertinho: Galician BERT Representations
Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets
PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data
IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages (EMNLP2020 Findings)
Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages (NeurIPS2020 WS)
IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP (COLING2020)
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization (EMNLP2021)
IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation (EMNLP2021)
AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages (EMNLP2021)
KinyaBERT: a Morphology-aware Kinyarwanda Language Model (ACL2022)
BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Revisiting Pre-Trained Models for Chinese Natural Language Processing (EMNLP2020 Findings)
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information (ACL2021) [github]
Intrinsic Knowledge Evaluation on Chinese Language Models
CPM: A Large-scale Generative Chinese Pre-trained Language Model [github]
PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
CLUE: A Chinese Language Understanding Evaluation Benchmark
CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and Generation
UER: An Open-Source Toolkit for Pre-training Models (EMNLP2019 Demo) [github]
Domain specific
AMMU -- A Survey of Transformer-based Biomedical Pretrained Language Models
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Self-Alignment Pretraining for Biomedical Entity Representations (NAACL2021) [github]
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking (ACL2021) [github]
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (ACL2019 WS)
BERT-based Ranking for Biomedical Entity Normalization
PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP2019)
Pre-trained Language Model for Biomedical Question Answering
How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering
On Adversarial Examples for Biomedical NLP Tasks
An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining (ACL2020 WS)
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [github]
Improving Biomedical Pretrained Language Models with Knowledge (BioNLP2021)
BioMegatron: Larger Biomedical Domain Language Model (EMNLP2020) [website]
Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art (EMNLP2020 WS)
A pre-training technique to localize medical BERT and enhance BioBERT [github]
exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources [github] (EMNLP2020 Findings)
BERTology Meets Biology: Interpreting Attention in Protein Language Models (ICLR2021)
ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks (AIME2020)
Publicly Available Clinical BERT Embeddings (NAACL2019 WS)
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus (NAACL2021)
MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning
A clinical specific BERT developed with huge size of Japanese clinical narrative
Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset (ACL2020) [github]
Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources
Classifying Long Clinical Documents with Pre-trained Transformers
Detecting Adverse Drug Reactions from Twitter through Domain-Specific Preprocessing and BERT Ensembling
Progress Notes Classification and Keyword Extraction using Attention-based Deep Learning Models with BERT
BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining
Prediction of ICD Codes with Clinical BERT Embeddings and Text Augmentation with Label Balancing using MIMIC-III
Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition (EMNLP2020)
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT (EMNLP2020)
Students Need More Attention: BERT-based Attention Model for Small Data with Application to Automatic Patient Message Triage (MLHC2020)
Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction [github]
SciBERT: Pretrained Contextualized Embeddings for Scientific Text (EMNLP2019) [github]
SPECTER: Document-level Representation Learning using Citation-informed Transformers (ACL2020) [github]
OAG-BERT: Pre-train Heterogeneous Entity-augmented Academic Language Models [github]
PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model
FinBERT: A Pretrained Language Model for Financial Communications
LEGAL-BERT: The Muppets straight out of Law School (EMNLP2020 Findings)
Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce
BERT Goes Shopping: Comparing Distributional Models for Product Representations
NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application
Code and Named Entity Recognition in StackOverflow (ACL2020) [github]
BERTweet: A pre-trained language model for English Tweets (EMNLP2020 Demo)
TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis
A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks
Analyzing COVID-19 Tweets with Transformer-based Language Models
Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media (EMNLP2020 Findings)
Multi-modal
A Survey on Visual Transformer
Transformers in Vision: A Survey
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)
VisualBERT: A Simple and Performant Baseline for Vision and Language
Selfie: Self-supervised Pretraining for Image Embedding
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision (ICLR2022)
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation (NeurIPS2021) [github]
Contrastive Bidirectional Transformer for Temporal Representation Learning
M-BERT: Injecting Multimodal Information in the BERT Structure
Integrating Multimodal Information in Large Pretrained Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions (NAACL2021)
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers (EMNLP2020)
Adaptive Transformers for Learning Multimodal Representations (ACL2020SRW) [github]
GEM: A General Evaluation Benchmark for Multimodal Tasks (ACL2021 Findings) [github]
Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)
VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
LambdaNetworks: Modeling long-range Interactions without Attention [github]
BERT representations for Video Question Answering (WACV2020)
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA (AAAI2021)
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning (ACL2021)
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [github]
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Contrastive Visual-Linguistic Pretraining
What is More Likely to Happen Next? Video-and-Language Future Event Prediction (EMNLP2020)
VisualGPT: Data-efficient Image Captioning by Balancing Visual Input and Linguistic Knowledge from Pretraining
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Injecting Semantic Concepts into End-to-End Image Captioning (CVPR2022)
Unified Vision-Language Pre-Training for Image Captioning and VQA (AAAI2020) [github]
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA (AAAI2022)
Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
VisualCOMET: Reasoning about the Dynamic Context of a Still Image (ECCV2020) [website]
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
VD-BERT: A Unified Vision and Dialog Transformer with BERT (EMNLP2020)
VL-BERT: Pre-training of Generic Visual-Linguistic Representations (ICLR2020)
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
UNITER: Learning UNiversal Image-TExt Representations
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Supervised Multimodal Bitransformers for Classifying Images and Text
InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs (TACL2021)
SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
LiT : Zero-Shot Transfer with Locked-image Text Tuning (CVPR2022)
WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training (NeurIPS2021)
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning (ACL2021)
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning (ACL2022)
Grounded Language-Image Pre-training [github]
VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts [github]
VinVL: Revisiting Visual Representations in Vision-Language Models
An Empirical Study of Training End-to-End Vision-and-Language Transformers (CVPR2022) [github]
Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Florence: A New Foundation Model for Computer Vision
Large-Scale Adversarial Training for Vision-and-Language Representation Learning (NeurIPS2020)
Flamingo: a Visual Language Model for Few-Shot Learning
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models [github]
Do DALL-E and Flamingo Understand Each Other?
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Unifying Vision-and-Language Tasks via Text Generation
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network (AAAI2021)
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Self-Supervised learning with cross-modal transformers for emotion recognition (SLT2020)
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision (EMNLP2020)
12-in-1: Multi-Task Vision and Language Representation Learning
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models (NAACL2021)
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training (CVPR2021)
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
CM3: A Causal Masked Multimodal Model of the Internet
Retrieval-Augmented Multimodal Language Modeling
Cycle Text-To-Image GAN with BERT
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
DeVLBert: Learning Deconfounded Visio-Linguistic Representations (ACMMM2020)
A Recurrent Vision-and-Language BERT for Navigation
BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning (CVPR2021)
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers (EMNLP2021)
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Understanding Advertisements with BERT (ACL2020)
BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval (SIGIR2020)
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain (CVPR2021)
LayoutLM: Pre-training of Text and Layout for Document Image Understanding (KDD2020) [github]
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding (ACL2021)
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Unifying Vision, Text, and Layout for Universal Document Processing
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding
BROS: A Pre-trained Language Model for Understanding Texts in Document
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
LayoutReader: Pre-training of Text and Layout for Reading Order Detection (EMNLP2021)
BERT for Large-scale Video Segment Classification with Test-time Augmentation (ICCV2019WS)
Is Space-Time Attention All You Need for Video Understanding?
lamBERT: Language and Action Learning Using Multimodal BERT
Generative Pretraining from Pixels [github] [website]
Visual Transformers: Token-based Image Representation and Processing for Computer Vision
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ICLR2021)
BEiT: BERT Pre-Training of Image Transformers
Zero-Shot Text-to-Image Generation [github] [website]
Hierarchical Text-Conditional Image Generation with CLIP Latents [website]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [website]
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Learning Transferable Visual Models From Natural Language Supervision [github] [website]
How Much Can CLIP Benefit Vision-and-Language Tasks?
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation (ACL2022)
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Training Vision Transformers for Image Retrieval
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval (NAACL2021)
Colorization Transformer (ICLR2021) [github]
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer [website]
Multimodal Pretraining for Dense Video Captioning (AACL-IJCNLP2020)
Is Space-Time Attention All You Need for Video Understanding?
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling (CVPR2021) [github]
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding (ACL2021 Findings)
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding (EMNLP2021)
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
A Generalist Agent [website]
SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering
An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Effectiveness of self-supervised pre-training for speech recognition
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Applying wav2vec2.0 to Speech Recognition in various low-resource languages
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Speech Recognition by Simply Fine-tuning BERT (ICASSP2021)
Understanding Semantics from Speech Through Pre-training
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks
Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision (ICML2020 WS)
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding
End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features
Speech-language Pre-training for End-to-end Spoken Language Understanding
Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding (Interspeech2020)
AudioCLIP: Extending CLIP to Image, Text and Audio
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Curriculum Pre-training for End-to-End Speech Translation (ACL2020)
MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models (ACL2021)
Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners
Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models
To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer's Disease Detection (Interspeech2020)
BERT for Joint Multichannel Speech Dereverberation with Spatial-aware Tasks
Model compression
Compression of Deep Learning Models for Text: A Survey
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Patient Knowledge Distillation for BERT Model Compression (EMNLP2019)
Small and Practical BERT Models for Sequence Labeling (EMNLP2019)
TinyBERT: Distilling BERT for Natural Language Understanding [github]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS2019 WS) [github]
Contrastive Distillation on Intermediate Representations for Language Model Compression (EMNLP2020)
Knowledge Distillation from Internal Representations (AAAI2020)
Reinforced Multi-Teacher Selection for Knowledge Distillation (AAAI2021)
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation (AAAI2021)
Dynamic Knowledge Distillation for Pre-trained Language Models (EMNLP2021)
Distilling Linguistic Context for Language Model Compression (EMNLP2021)
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
PoWER-BERT: Accelerating BERT inference for Classification Tasks
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
Extremely Small BERT Models from Mixed-Vocabulary Training (EACL2021)
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing (EMNLP2020)
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning (ACL2020 SRW)
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices (ACL2020)
Distilling Knowledge from Pre-trained Language Models via Text Smoothing
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Reducing Transformer Depth on Demand with Structured Dropout
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference (ACL2020)
BERT Loses Patience: Fast and Robust Inference with Early Exit [github] [github]
Accelerating BERT Inference for Sequence Labeling via Early-Exit (ACL2021)
Elbert: Fast Albert with Confidence-Window Based Early Exit
RomeBERT: Robust Training of Multi-Exit BERT
TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference (NAACL2021)
FastBERT: a Self-distilling BERT with Adaptive Inference Time (ACL2020)
Distilling Large Language Models into Tiny and Effective Students using pQRNN
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression (COLING2020)
Poor Man's BERT: Smaller and Faster Transformer Models
schuBERT: Optimizing Elements of BERT (ACL2020)
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance (EMNLP2020) [github]
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers (ACL2021 Findings)
From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression (AAAI2022)
TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER (ACL2020)
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models (ACL2020)
Robustly Optimized and Distilled Training for Natural Language Understanding
Structured Pruning of Large Language Models
Movement Pruning: Adaptive Sparsity by Fine-Tuning [github]
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning (EMNLP2020 Findings)
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior (EMNLP2020 Findings)
Parameter-Efficient Transfer Learning with Diff Pruning
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding (EMNLP2020 WS) [github]
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models (ACL2021) [github]
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains (ACL2021 Findings)
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models
An Approximation Algorithm for Optimal Subarchitecture Extraction [github]
Structured Pruning of a BERT-based Question Answering Model
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering (ACL2020)
Distilling Knowledge Learned in BERT for Text Generation (ACL2020)
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR (Interspeech2020)
Pre-trained Summarization Distillation
Understanding BERT Rankers Under Distillation (ICTIR2020)
Simplified TinyBERT: Knowledge Distillation for Document Retrieval
Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT (ACL2020 WS)
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing (ACL2020 Demo)
TopicBERT for Energy Efficient Document Classification (EMNLP2020 Findings)
MiniVLM: A Smaller and Faster Vision-Language Model
Compressing Visual-linguistic Model via Knowledge Distillation
Playing Lottery Tickets with Vision and Language
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Q8BERT: Quantized 8Bit BERT (NeurIPS2019 WS)
Training with Quantization Noise for Extreme Model Compression (ICLR2021)
Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing
BinaryBERT: Pushing the Limit of BERT Quantization (ACL2021)
I-BERT: Integer-only BERT Quantization
ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques (AAAI2021)
TernaryBERT: Distillation-aware Ultra-low Bit BERT (EMNLP2020)
EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP
Optimizing Inference Performance of Transformers on CPUs
Large language model
Language Models are Unsupervised Multitask Learners [github]
Language Models are Few-Shot Learners (NeurIPS2020) [github]
Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems
OPT: Open Pre-trained Transformer Language Models [website]
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [blog]
Training Compute-Optimal Large Language Models
PaLM: Scaling Language Modeling with Pathways [blog]
LLaMA: Open and Efficient Foundation Language Models
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling [github]
PolyLM: An Open Source Polyglot Large Language Model
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model [blog]
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale [github]
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training [blog]
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Reinforcement learning from human feedback
Fine-Tuning Language Models from Human Preferences [github] [blog]
Training language models to follow instructions with human feedback [github] [blog]
WebGPT: Browser-assisted question-answering with human feedback [blog]
Improving alignment of dialogue agents via targeted human judgements
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Training Language Models with Language Feedback (ACL2022 WS)
Self-Instruct: Aligning Language Model with Self Generated Instructions [github]
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
ChatGPT: A Meta-Analysis after 2.5 Months
Misc.
Extracting Training Data from Large Language Models
Generative Language Modeling for Automated Theorem Proving
Do you have the right scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods (ACL2020)
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [github]
Cloze-driven Pretraining of Self-attention Networks
Learning and Evaluating General Linguistic Intelligence
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (ACL2019 WS)
Learning to Speak and Act in a Fantasy Text Adventure Game (EMNLP2019)
A Two-Stage Masked LM Method for Term Set Expansion (ACL2020)
Cold-start Active Learning through Self-supervised Language Modeling (EMNLP2020)
Conditional BERT Contextual Augmentation
Data Augmentation using Pre-trained Transformer Models (AACL-IJCNLP2020) [github]
Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks (COLING2020)
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Unsupervised Text Style Transfer with Padded Masked Language Models (EMNLP2020)
Assessing Discourse Relations in Language Generation from Pre-trained Language Models
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (ICLR2020)
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization (AAAI2021)
Multi-node Bert-pretraining: Cost-efficient Approach
How to Train BERT with an Academic Budget
Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management [github]
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Efficient Large-Scale Language Model Training on GPU Clusters
Scaling Laws for Neural Language Models
Scaling Laws for Autoregressive Generative Modeling
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
The Pile: An 800GB Dataset of Diverse Text for Language Modeling [website]
Deduplicating Training Data Makes Language Models Better
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models (ICLR2020)
A Mutual Information Maximization Perspective of Language Representation Learning (ICLR2020)
Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment (AAAI2020)
Weight Poisoning Attacks on Pre-trained Models (ACL2020)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT (EMNLP2020)
BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks (ACL2021 Findings)
Model Extraction and Adversarial Transferability, Your BERT is Vulnerable! (NAACL2021)
Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT
Robust Encodings: A Framework for Combating Adversarial Typos (ACL2020)
On the Robustness of Language Encoders against Grammatical Errors (ACL2020)
Evaluating the Robustness of Neural Language Models to Input Perturbations (EMNLP2021)
Pretrained Transformers Improve Out-of-Distribution Robustness (ACL2020) [github]
"You are grounded!": Latent Name Artifacts in Pre-trained Language Models (EMNLP2020)
The Right Tool for the Job: Matching Model and Instance Complexities (ACL2020) [github]
Unsupervised Domain Clusters in Pretrained Language Models (ACL2020)
Thieves on Sesame Street! Model Extraction of BERT-based APIs (ICLR2020)
Graph-Bert: Only Attention is Needed for Learning Graph Representations
Graph-Aware Transformer: Is Attention All Graphs Need?
CodeBERT: A Pre-Trained Model for Programming and Natural Languages (EMNLP2020 Findings)
Unsupervised Translation of Programming Languages
Unified Pre-training for Program Understanding and Generation (NAACL2021)
MathBERT: A Pre-Trained Model for Mathematical Formula Understanding
Investigating Math Word Problems using Pretrained Multilingual Language Models
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning (ACL2021)
Pre-train or Annotate? Domain Adaptation with a Constrained Budget (EMNLP2021)
Item-based Collaborative Filtering with BERT (ACL2020 WS)
RecoBERT: A Catalog Language Model for Text-Based Recommendations
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Extending Machine Language Models toward Human-Level Language Understanding
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data (ACL2020)
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level (ACL2021 Findings) [github]
Glyce: Glyph-vectors for Chinese Character Representations
Back to the Future -- Sequential Alignment of Text Representations
Improving Cuneiform Language Identification with BERT (NAACL2019 WS)
Generating Derivational Morphology with BERT
BERT has a Moral Compass: Improvements of ethical and moral values of machines
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training (ACL2021 Findings)
SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (ACM-BCB2019)
ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction
BERT Learns (and Teaches) Chemistry
Prediction of RNA-protein interactions using a nucleotide language model
Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from Transformers by Self-supervised Learning of Sketch Gestalt (CVPR2020)
The Chess Transformer: Mastering Play using Generative Language Models
The Go Transformer: Natural Language Modeling for Game Play
On the comparability of Pre-trained Language Models
Transformers: State-of-the-art Natural Language Processing
The Cost of Training NLP Models: A Concise Overview

tomohideshibata / BERT-related-papers

readme

BERT-related Papers

Table of Contents

Survey paper

Downstream task

QA, MC, Dialogue

Slot filling and Intent Detection

Analysis

Word segmentation, parsing, NER

Pronoun/coreference resolution

Word sense disambiguation

Sentiment analysis

Relation extraction

Knowledge base

Text classification

WSC, WNLI, NLI

Commonsense

Extractive summarization

Grammatical error correction

IR

Generation

Quality evaluator

Modification (multi-task, masking strategy, etc.)

Tokenization

Prompt

Sentence embedding

Transformer variants

Probe

Inside BERT

Multi-lingual

Other than English models

Domain specific

Multi-modal

Model compression

Large language model

Reinforcement learning from human feedback

Misc.