BERT-related Papers
This is a list of BERT-related papers. Any feedback is welcome.
(ChatGPT-related papers are listed at https://github.com/tomohideshibata/ChatGPT-related-papers.)
Table of Contents
Survey paper
Downstream task
QA, MC, Dialogue
- Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
- A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets
- A BERT Baseline for the Natural Questions
- MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension (ACL2019)
- BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions (NAACL2019) [github]
- Natural Perturbation for Robust Question Answering
- Unsupervised Domain Adaptation on Reading Comprehension
- BERTQA -- Attention on Steroids
- Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
- Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension
- Logic-Guided Data Augmentation and Regularization for Consistent Question Answering (ACL2020)
- UnifiedQA: Crossing Format Boundaries With a Single QA System
- How Can We Know When Language Models Know?
- A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning (EMNLP2019)
- A Simple and Effective Model for Answering Multi-span Questions [github]
- Injecting Numerical Reasoning Skills into Language Models (ACL2020)
- Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks
- SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
- Multi-hop Question Answering via Reasoning Chains
- Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents
- Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering (EMNLP2019 WS)
- Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network
- Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering (ACL2020)
- HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
- Unsupervised Multi-hop Question Answering by Question Generation (NAACL2021)
- End-to-End Open-Domain Question Answering with BERTserini (NAALC2019)
- Latent Retrieval for Weakly Supervised Open Domain Question Answering (ACL2019)
- Dense Passage Retrieval for Open-Domain Question Answering (EMNLP2020)
- Efficient Passage Retrieval with Hashing for Open-domain Question Answering (ACL2021)
- End-to-End Training of Neural Retrievers for Open-Domain Question Answering
- Domain-matched Pre-training Tasks for Dense Retrieval
- Towards Robust Neural Retrieval Models with Synthetic Pre-Training
- Simple Entity-Centric Questions Challenge Dense Retrievers (EMNLP2021) [github]
- Phrase Retrieval Learns Passage Retrieval, Too (EMNLP2021) [github]
- Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
- Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering (EACL2021)
- Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval
- Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval (NAACL2021) [github]
- Retrieve, Read, Rerank, then Iterate: Answering Open-Domain Questions of Varying Reasoning Steps from Text
- RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
- Pre-training Tasks for Embedding-based Large-scale Retrieval (ICLR2020)
- Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering (EMNLP2019)
- QED: A Framework and Dataset for Explanations in Question Answering [github]
- Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering (ICLR2020)
- Relevance-guided Supervision for OpenQA with ColBERT
- RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering
- Joint Passage Ranking for Diverse Multi-Answer Retrieval
- SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval
- Don't Read Too Much into It: Adaptive Computation for Open-Domain Question Answering (EMNLP2020 WS)
- Pruning the Index Contents for Memory Efficient Open-Domain QA [github]
- Is Retriever Merely an Approximator of Reader?
- Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation
- RikiNet: Reading Wikipedia Pages for Natural Question Answering (ACL2020)
- BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA
- DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding (SIGIR2020)
- Learning to Ask Unanswerable Questions for Machine Reading Comprehension (ACL2019)
- Unsupervised Question Answering by Cloze Translation (ACL2019)
- Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation (ICLR2020)
- A Recurrent BERT-based Model for Question Generation (EMNLP2019 WS)
- Unsupervised Question Decomposition for Question Answering [github]
- Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models
- Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering (ACL2020)
- What Are People Asking About COVID-19? A Question Classification Dataset
- Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
- Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension (ACL2019)
- QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering (NAACL2021) [github] [blog]
- Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning (CIKM2019)
- SG-Net: Syntax-Guided Machine Reading Comprehension
- MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
- Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning (EMNLP2019)
- ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (ICLR2020)
- Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization
- BAS: An Answer Selection Method Using BERT Language Model
- Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection (AMMCS2019)
- TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection (AAAI2020)
- The Cascade Transformer: an Application for Efficient Answer Sentence Selection (ACL2020)
- Support-BERT: Predicting Quality of Question-Answer Pairs in MSDN using Deep Bidirectional Transformer
- Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
- Benchmarking Robustness of Machine Reading Comprehension Models
- Evaluating NLP Models via Contrast Sets
- Undersensitivity in Neural Reading Comprehension
- Developing a How-to Tip Machine Comprehension Dataset and its Evaluation in Machine Comprehension by BERT (ACL2020 WS)
- A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension (ACL2019 WS)
- FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension (ACL2019 WS)
- BERT with History Answer Embedding for Conversational Question Answering (SIGIR2019)
- GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension (ICML2019 WS)
- TAPAS: Weakly Supervised Table Parsing via Pre-training (ACL2020)
- TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data (ACL2020)
- Understanding tables with intermediate pre-training (EMNLP2020 Findings)
- GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing (ICLR2021)
- Table Search Using a Deep Contextualized Language Model (SIGIR2020)
- Open Domain Question Answering over Tables via Dense Retrieval (NAACL2021)
- Capturing Row and Column Semantics in Transformer Based Question Answering over Tables (NAACL2021)
- MATE: Multi-view Attention for Table Transformer Efficiency (EMNLP2021)
- TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions (EMNLP2020)
- Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian (RANLP2019)
- XQA: A Cross-lingual Open-domain Question Answering Dataset (ACL2019)
- XOR QA: Cross-lingual Open-Retrieval Question Answering (NAACL2021) [website]
- Cross-Lingual Machine Reading Comprehension (EMNLP2019)
- Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
- Multilingual Question Answering from Formatted Text applied to Conversational Agents
- BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels (EMNLP2019)
- MLQA: Evaluating Cross-lingual Extractive Question Answering
- Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension
- Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
- Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation (COLING2020)
- MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering [github]
- Towards More Equitable Question Answering Systems: How Much More Data Do You Need? (ACL2021)
- X-METRA-ADA: Cross-lingual Meta-Transfer Learning Adaptation to Natural Language Understanding and Question Answering (NAACL2021)
- Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension (TACL)
- SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis
- DuReaderrobust: A Chinese Dataset Towards Evaluating the Robustness of Machine Reading Comprehension Models
- Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension (EMNLP2019)
- Few-Shot Question Answering by Pretraining Span Selection (ACL2021)
- DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue [website]
- A Short Survey of Pre-trained Language Models for Conversational AI-A NewAge in NLP
- MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding (ACL2021)
- BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer (Interspeech2019)
- Dialog State Tracking: A Neural Reading Comprehension Approach
- A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems (ICASSP2020)
- Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking
- Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker
- Dialogue State Tracking with Pretrained Encoder for Multi-domain Trask-oriented Dialogue Systems
- Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking (ACL2020)
- A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset (KDD2020 WS)
- Knowledge-Aware Graph-Enhanced GPT-2 for Dialogue State Tracking
- Coreference Augmentation for Multi-Domain Task-Oriented Dialogue State Tracking (Interspeech2021)
- ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues (EMNLP2020)
- Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances (ACL2021)
- Domain Adaptive Training BERT for Response Selection
- Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots
- Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking (ECIR2020)
- MuTual: A Dataset for Multi-Turn Dialogue Reasoning (ACL2020)
- DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement
- Generalized Conditioned Dialogue Generation Based on Pre-trained Language Model
- BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data (ACL2021)
- Interactive Teaching for Conversational AI (NeurIPS2020 WS)
- BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding
Slot filling and Intent Detection
- A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding (EMNLP2019)
- BERT for Joint Intent Classification and Slot Filling
- A Co-Interactive Transformer for Joint Slot Filling and Intent Detection (ICASSP2021)
- Few-shot Intent Classification and Slot Filling with Retrieved Examples (NAACL2021)
- Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
- A Comparison of Deep Learning Methods for Language Understanding (Interspeech2019)
- Data Augmentation for Spoken Language Understanding via Pretrained Models
- [Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning] (EMNLP2021)
- STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++ (AACL-IJCNLP2020) [github]
Analysis
- Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention
- Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision (ACL2019)
- BERT-based Lexical Substitution (ACL2019)
- Assessing BERT’s Syntactic Abilities
- Investigating Novel Verb Learning in BERT: Selectional Preference Classes and Alternation-Based Syntactic Generalization (EMNLP2020 WS)
- Does BERT agree? Evaluating knowledge of structure dependence through agreement relations
- Simple BERT Models for Relation Extraction and Semantic Role Labeling
- Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach (COLING2020)
- LIMIT-BERT : Linguistic Informed Multi-Task BERT (EMNLP2020 Findings)
- Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards
- A Simple BERT-Based Approach for Lexical Simplification
- BERT-Based Arabic Social Media Author Profiling
- Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media
- Evaluating the Factual Consistency of Abstractive Text Summarization
- Generating Fact Checking Explanations (ACL2020)
- NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution
- xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation
- TabFact: A Large-scale Dataset for Table-based Fact Verification (ICLR2020)
- Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents
- A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks
- LAMBERT: Layout-Aware (Language) Modeling for information extraction (ICDAR2021)
- Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings (ECIR2020) [github]
- Keyphrase Extraction with Span-based Feature Representations
- Keyphrase Prediction With Pre-trained Language Model
- Self-Supervised Contextual Keyword and Keyphrase Retrieval with Self-Labelling [github]
- Joint Keyphrase Chunking and Salience Ranking with BERT
- Generalizing Natural Language Analysis through Span-relation Representations (ACL2020) [github]
- What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
- tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection (ACL2020)
- Domain Adaptation with BERT-based Domain Classification and Data Selection (EMNLP2019 WS)
- PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models (TACL2020)
- Unsupervised Out-of-Domain Detection via Pre-trained Transformers (ACL2021) [github]
- Knowledge Distillation for BERT Unsupervised Domain Adaptation
- Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERT (LREC2020)
- Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? (NAACL2021)
- On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification (IJCAI2020)
- Adapting BERT to Implicit Discourse Relation Classification with a Focus on Discourse Connectives (LREC2020)
- Labeling Explicit Discourse Relations using Pre-trained Language Models (TSD2020)
- Causal-BERT : Language models for causality detection between events expressed in text
- BERT4SO: Neural Sentence Ordering by Fine-tuning BERT
- Document-Level Event Argument Extraction by Conditional Generation (NAACL2021)
- Cross-lingual Zero- and Few-shot Hate Speech Detection Utilising Frozen Transformer Language Models and AXEL
- Same Side Stance Classification Task: Facilitating Argument Stance Classification by Fine-tuning a BERT Model
- Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection
- KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT
- ALBERT-BiLSTM for Sequential Metaphor Detection (ACL2020 WS)
- MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories (NAACL2021)
- A BERT-based Dual Embedding Model for Chinese Idiom Prediction (COLING2020)
- Should You Fine-Tune BERT for Automated Essay Scoring? (ACL2020 WS)
- KILT: a Benchmark for Knowledge Intensive Language Tasks (NAACL2021) [github]
- IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding (AACL-IJCNLP2020)
- MedFilter: Improving Extraction of Task-relevant Utterances through Integration of Discourse Structure and Ontological Knowledge (EMNLP2020)
- ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces (AAAI2021)
- UserBERT: Self-supervised User Representation Learning
- UserBERT: Contrastive User Model Pre-training
- Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning (COLING2020)
- Automatic punctuation restoration with BERT models
Word segmentation, parsing, NER
- BERT Meets Chinese Word Segmentation
- Unified Multi-Criteria Chinese Word Segmentation with BERT
- RethinkCWS: Is Chinese Word Segmentation a Solved Task? (EMNLP2020) [github]
- Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability (ACL2021 Findings)
- Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
- Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
- Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT (FLAIRS-33)
- Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing
- fastHan: A BERT-based Joint Many-Task Toolkit for Chinese NLP
- Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited (EMNLP2019)
- Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?
- Parsing as Pretraining (AAAI2020)
- Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
- Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement
- StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
- pyBART: Evidence-based Syntactic Transformations for IE [github]
- Named Entity Recognition -- Is there a glass ceiling? (CoNLL2019)
- A Unified MRC Framework for Named Entity Recognition
- Biomedical named entity recognition using BERT in the machine reading comprehension framework
- Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
- Robust Named Entity Recognition with Truecasing Pretraining (AAAI2020)
- LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition
- Named Entity Recognition as Dependency Parsing (ACL2020)
- Exploring Cross-sentence Contexts for Named Entity Recognition with BERT
- CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI2021) [github]
- Embeddings of Label Components for Sequence Labeling: A Case Study of Fine-grained Named Entity Recognition (ACL2020 SRW)
- BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision (KDD2020) [github]
- Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve
- Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language (ACL2020)
- To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging (EMNLP2020)
- Example-Based Named Entity Recognition
- FLERT: Document-Level Features for Named Entity Recognition
- Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition
- What's in a Name? Are BERT Named Entity Representations just as Good for any other Name? (ACL2020 WS)
- Interpretable Multi-dataset Evaluation for Named Entity Recognition (EMNLP2020) [github]
- Entity Enhanced BERT Pre-training for Chinese NER (EMNLP2020)
- Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter (ACL2021)
- FLAT: Chinese NER Using Flat-Lattice Transformer (ACL2020)
- BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition
- MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers
- Knowledge Guided Named Entity Recognition for BioMedical Text
- Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New Approach Using XLM-RoBERTa Alignment
- Portuguese Named Entity Recognition using BERT-CRF
- Towards Lingua Franca Named Entity Recognition with BERT
- Larger-Context Tagging: When and Why Does It Work? (NAACL2021)
Pronoun/coreference resolution
- A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution
- Resolving Gendered Ambiguous Pronouns with BERT (ACL2019 WS)
- Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge (ACL2019 WS)
- Gendered Pronoun Resolution using BERT and an extractive question answering formulation (ACL2019 WS)
- MSnet: A BERT-based Network for Gendered Pronoun Resolution (ACL2019 WS)
- Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation
- Fill the GAP: Exploiting BERT for Pronoun Resolution (ACL2019 WS)
- On GAP Coreference Resolution Shared Task: Insights from the 3rd Place Solution (ACL2019 WS)
- Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution (ACL2019 WS)
- Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction (ACL2021)
- BERT Masked Language Modeling for Co-reference Resolution (ACL2019 WS)
- Coreference Resolution with Entity Equalization (ACL2019)
- BERT for Coreference Resolution: Baselines and Analysis (EMNLP2019) [github]
- WikiCREM: A Large Unsupervised Corpus for Coreference Resolution (EMNLP2019)
- CD2CR: Co-reference Resolution Across Documents and Domains (EACL2021)
- Ellipsis Resolution as Question Answering: An Evaluation (EACL2021)
- Coreference Resolution as Query-based Span Prediction
- Coreferential Reasoning Learning for Language Representation (EMNLP2020)
- Revisiting Memory-Efficient Incremental Coreference Resolution
- Revealing the Myth of Higher-Order Inference in Coreference Resolution (EMNLP2020)
- Coreference Resolution without Span Representations (ACL2021)
- Neural Mention Detection (LREC2020)
- ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT (ACL2020)
- An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution (COLING2020)
- BERT-based Cohesion Analysis of Japanese Texts (COLING2020)
- Joint Coreference Resolution and Character Linking for Multiparty Conversation
- Sequence to Sequence Coreference Resolution (COLING2020 WS)
- Within-Document Event Coreference with BERT-Based Contextualized Representations
- Multi-task Learning Based Neural Bridging Reference Resolution
- Bridging Anaphora Resolution as Question Answering (ACL2020)
- Fine-grained Information Status Classification Using Discourse Context-Aware BERT (COLING2020)
Word sense disambiguation
- Language Models and Word Sense Disambiguation: An Overview and Analysis
- GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (EMNLP2019)
- Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences (EMNLP2020 Findings)
- Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations (EMNLP2019)
- Using BERT for Word Sense Disambiguation
- Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation (ACL2019)
- Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings (KONVENS2019)
- An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on Bert
- PolyLM: Learning about Polysemy through Language Modeling (EACL2021)
- CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages (ACL2020)
- Cross-lingual Word Sense Disambiguation using mBERT Embeddings with Syntactic Dependencies
- VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word Representations for Improved Definition Modeling (EMNLP2020)
Sentiment analysis
- Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL2019)
- BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis (NAACL2019)
- Exploiting BERT for End-to-End Aspect-based Sentiment Analysis (EMNLP2019 WS)
- Improving BERT Performance for Aspect-Based Sentiment Analysis
- Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis
- Understanding Pre-trained BERT for Aspect-based Sentiment Analysis (COLING2020)
- Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa (NAACL2021)
- Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification (LREC2020)
- An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese (ACL2019)
- "Mask and Infill" : Applying Masked Language Model to Sentiment Transfer
- Adversarial Training for Aspect-Based Sentiment Analysis with BERT
- Adversarial and Domain-Aware BERT for Cross-Domain Sentiment Analysis (ACL2020)
- Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference
- DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis
- YASO: A New Benchmark for Targeted Sentiment Analysis
- SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL2020)
Relation extraction
- Matching the Blanks: Distributional Similarity for Relation Learning (ACL2019)
- BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction (NLPCC2019)
- Enriching Pre-trained Language Model with Entity Information for Relation Classification
- Span-based Joint Entity and Relation Extraction with Transformer Pre-training
- Fine-tune Bert for DocRED with Two-step Process
- Relation Extraction as Two-way Span-Prediction
- Entity, Relation, and Event Extraction with Contextualized Span Representations (EMNLP2019)
- Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text
- Downstream Model Design of Pre-trained Language Model for Relation Extraction Task
- Efficient long-distance relation extraction with DG-SpanBERT
- Global-to-Local Neural Networks for Document-Level Relation Extraction (EMNLP2020)
- DARE: Data Augmented Relation Extraction with GPT-2
- Distantly-Supervised Neural Relation Extraction with Side Information using BERT (IJCNN2020)
- Improving Distantly-Supervised Relation Extraction through BERT-based Label & Instance Embeddings
- An End-to-end Model for Entity-level Relation Extraction using Multi-instance Learning (EACL2021)
- ZS-BERT: Towards Zero-Shot Relation Extraction with Attribute Representation Learning (NAACL2021) [github]
- AdaPrompt: Adaptive Prompt-based Finetuning for Relation Extraction
- Dialogue-Based Relation Extraction (ACL2020)
- An Embarrassingly Simple Model for Dialogue Relation Extraction
- A Novel Cascade Binary Tagging Framework for Relational Triple Extraction (ACL2020) [github]
- ExpBERT: Representation Engineering with Natural Language Explanations (ACL2020) [github]
- AutoRC: Improving BERT Based Relation Classification Models via Architecture Search
- Investigation of BERT Model on Biomedical Relation Extraction Based on Revised Fine-tuning Mechanism
- Experiments on transfer learning architectures for biomedical relation extraction
- Improving BERT Model Using Contrastive Learning for Biomedical Relation Extraction (BioNLP2021)
- Cross-Lingual Relation Extraction with Transformers
- Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification
- Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction
- A BERT-based One-Pass Multi-Task Model for Clinical Temporal Relation Extraction (ACL2020 WS)
- Exploring Contextualized Neural Language Models for Temporal Dependency Parsing
- Temporal Reasoning on Implicit Events from Distant Supervision
- IMoJIE: Iterative Memory-Based Joint Open Information Extraction (ACL2020)
- OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction (EMNLP2020) [github]
- Multi2OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT (EMNLP2020 Findings)
Knowledge base
- KG-BERT: BERT for Knowledge Graph Completion
- How Context Affects Language Models' Factual Predictions (AKBC2020)
- Inducing Relational Knowledge from BERT (AAAI2020)
- Latent Relation Language Models (AAAI2020)
- Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model (ICLR2020)
- Scalable Zero-shot Entity Linking with Dense Entity Retrieval (EMNLP2020) [github]
- Zero-shot Entity Linking with Efficient Long Range Sequence Modeling (EMNLP2020 Findings)
- Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNLL2019)
- Improving Entity Linking by Modeling Latent Entity Type Information (AAAI2020)
- Global Entity Disambiguation with Pretrained Contextualized Embeddings of Words and Entities
- YELM: End-to-End Contextualized Entity Linking
- Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking (AKBC2020)
- LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention (EMNLP2020) [github]
- Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas
- CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata (EACL2021)
- PEL-BERT: A Joint Model for Protocol Entity Linking
- End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
- Efficient One-Pass End-to-End Entity Linking for Questions (EMNLP2020) [github]
- Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
- Entity Linking in 100 Languages (EMNLP2020) [github]
- COMETA: A Corpus for Medical Entity Linking in the Social Media (EMNLP2020) [github]
- How Can We Know What Language Models Know? (TACL2020) [github]
- How to Query Language Models?
- Deep Entity Matching with Pre-Trained Language Models
- Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model (ACL2021)
- Constructing Taxonomies from Pretrained Language Models (NAACL2021)
- Language Models are Open Knowledge Graphs
- Can Generative Pre-trained Language Models Serve as Knowledge Bases for Closed-book QA? (ACL2021)
- DualTKB: A Dual Learning Bridge between Text and Knowledge Base (EMNLP2020) [github]
- Zero-shot Slot Filling with DPR and RAG
- How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds [github]
- MLMLM: Link Prediction with Mean Likelihood Masked Language Model
- Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases
Text classification
- Deep Learning Based Text Classification: A Comprehensive Review
- A Text Classification Survey: From Shallow to Deep Learning
- How to Fine-Tune BERT for Text Classification?
- X-BERT: eXtreme Multi-label Text Classification with BERT
- An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels (EMNLP2020)
- Taming Pretrained Transformers for Extreme Multi-label Text Classification (KDD2020)
- Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations (EMNLP2020 WS)
- DocBERT: BERT for Document Classification
- Enriching BERT with Knowledge Graph Embeddings for Document Classification
- Classification and Clustering of Arguments with Contextualized Word Embeddings (ACL2019)
- BERT for Evidence Retrieval and Claim Verification
- Stacked DeBERT: All Attention in Incomplete Data for Text Classification
- Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data
- BAE: BERT-based Adversarial Examples for Text Classification (EMNLP2020)
- FireBERT: Hardening BERT-based classifiers against adversarial attack [github]
- GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples (ACL2020)
- Description Based Text Classification with Reinforcement Learning
- VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification
- Zero-shot Text Classification via Reinforced Self-training (ACL2020)
- On Data Augmentation for Extreme Multi-label Classification
- Noisy Channel Language Model Prompting for Few-Shot Text Classification
- Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning (NAACL2021)
- Towards Evaluating the Robustness of Chinese BERT Classifiers
- COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter [github]
- Large Scale Legal Text Classification Using Transformer Models
- BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification (NAACL2021)
- A Comparison of LSTM and BERT for Small Corpus
WSC, WNLI, NLI
- Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge
- A Surprisingly Robust Trick for the Winograd Schema Challenge
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale (AAAI2020)
- TTTTTackling WinoGrande Schemas
- WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge (ACL2020)
- The Sensitivity of Language Models and Humans to Winograd Schema Perturbations (ACL2020)
- Precise Task Formalization Matters in Winograd Schema Evaluations (EMNLP2020)
- Tackling Domain-Specific Winograd Schemas with Knowledge-Based Reasoning and Machine Learning
- A Review of Winograd Schema Challenge Datasets and Approaches
- Improving Natural Language Inference with a Pretrained Parser
- Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition
- DocNLI: A Large-scale Dataset for Document-level Natural Language Inference (ACL2021 Findings)
- Adversarial NLI: A New Benchmark for Natural Language Understanding
- Adversarial Analysis of Natural Language Inference Systems (ICSC2020)
- ANLIzing the Adversarial Natural Language Inference Dataset
- Syntactic Data Augmentation Increases Robustness to Inference Heuristics (ACL2020)
- Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets (EMNLP2020 WS) [github]
- HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference (LREC2020)
- Use of Machine Translation to Obtain Labeled Datasets for Resource-Constrained Languages (EMNLP2020) [github]
- FarsTail: A Persian Natural Language Inference Dataset
- Evaluating BERT for natural language inference: A case study on the CommitmentBank (EMNLP2019)
- Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language? (ACL2020)
- Abductive Commonsense Reasoning (ICLR2020)
- Entailment as Few-Shot Learner
- Collecting Entailment Data for Pretraining: New Protocols and Negative Results
- WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation (EMNLP2022 Findings) [github]
- Mining Knowledge for Natural Language Inference from Wikipedia Categories (EMNLP2020 Findings)
Commonsense
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (NAACL2019)
- Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
- HellaSwag: Can a Machine Really Finish Your Sentence? (ACL2019) [website]
- A Method for Building a Commonsense Inference Dataset Based on Basic Events (EMNLP2020) [website]
- Story Ending Prediction by Transferable BERT (IJCAI2019)
- Explain Yourself! Leveraging Language Models for Commonsense Reasoning (ACL2019)
- Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning (ACL2020)
- Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
- Informing Unsupervised Pretraining with External Linguistic Knowledge
- Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test
- BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge
- Commonsense Knowledge Mining from Pretrained Models (EMNLP2019)
- KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning (EMNLP2019)
- Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
- Do Massively Pretrained Language Models Make Better Storytellers? (CoNLL2019)
- PIQA: Reasoning about Physical Commonsense in Natural Language (AAAI2020)
- Evaluating Commonsense in Pre-trained Language Models (AAAI2020)
- Why Do Masked Neural Language Models Still Need Common Sense Knowledge?
- Does BERT Solve Commonsense Task via Commonsense Knowledge?
- Unsupervised Commonsense Question Answering with Self-Talk (EMNLP2020)
- Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering (AAAI2021)
- G-DAUG: Generative Data Augmentation for Commonsense Reasoning
- Contrastive Self-Supervised Learning for Commonsense Reasoning (ACL2020)
- Differentiable Open-Ended Commonsense Reasoning
- Adversarial Training for Commonsense Inference (ACL2020 WS)
- Do Fine-tuned Commonsense Language Models Really Generalize?
- Do Language Models Perform Generalizable Commonsense Inference? (ACL2021 Findings)
- Improving Zero Shot Learning Baselines with Commonsense Knowledge
- XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [github]
- Do Neural Language Representations Learn Physical Commonsense? (CogSci2019)
Extractive summarization
- HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization (ACL2019)
- Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression
- Discourse-Aware Neural Extractive Text Summarization (ACL2020) [github]
- AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization
- Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT (COLING2020)
- Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help ! (EMNLP2020 WS)
- Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations (EMNLP2019 WS)
- Continual BERT: Continual Learning for Adaptive Extractive Summarization of COVID-19 Literature
Grammatical error correction
- Multi-headed Architecture Based on BERT for Grammatical Errors Correction (ACL2019 WS)
- Towards Minimal Supervision BERT-based Grammar Error Correction
- Learning to combine Grammatical Error Corrections (EMNLP2019 WS)
- LM-Critic: Language Models for Unsupervised Grammatical Error Correction (EMNLP2021) [github]
- Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction (ACL2020)
- Chinese Grammatical Correction Using BERT-based Pre-trained Model (AACL-IJCNLP2020)
- Spelling Error Correction with Soft-Masked BERT (ACL2020)
IR
- BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models [github]
- Pretrained Transformers for Text Ranking: BERT and Beyond
- Passage Re-ranking with BERT
- Investigating the Successes and Failures of BERT for Passage Re-Ranking
- Understanding the Behaviors of BERT in Ranking
- Document Expansion by Query Prediction
- Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval (ACL2021)
- CEDR: Contextualized Embeddings for Document Ranking (SIGIR2019)
- Deeper Text Understanding for IR with Contextual Neural Language Modeling (SIGIR2019)
- FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance (SIGIR2019)
- An Analysis of BERT FAQ Retrieval Models for COVID-19 Infobot
- COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval
- Unsupervised FAQ Retrieval with Question Generation and BERT (ACL2020)
- Multi-Stage Document Ranking with BERT
- Learning-to-Rank with BERT in TF-Ranking
- Transformer-Based Language Models for Similar Text Retrieval and Ranking
- DeText: A Deep Text Ranking Framework with BERT
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT (SIGIR2020)
- RepBERT: Contextualized Text Embeddings for First-Stage Retrieval [github]
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
- Multi-Perspective Semantic Information Retrieval
- CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos (SIGIR2022)
- Expansion via Prediction of Importance with Contextualization (SIGIR2020)
- BERT-QE: Contextualized Query Expansion for Document Re-ranking (EMNLP2020 Findings)
- Beyond [CLS] through Ranking by Generation (EMNLP2020)
- Efficient Document Re-Ranking for Transformers by Precomputing Term Representations (SIGIR2020)
- Training Curricula for Open Domain Answer Re-Ranking (SIGIR2020)
- Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling
- Boosted Dense Retriever
- ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval
- Document Ranking with a Pretrained Sequence-to-Sequence Model
- A Neural Corpus Indexer for Document Retrieval
- COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List (NAACL2021)
- Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search (SIGIR2020)
- Fine-tune BERT for E-commerce Non-Default Search Ranking
- IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles
- ProphetNet-Ads: A Looking Ahead Strategy for Generative Retrieval Models in Sponsored Search Engine (NLPCC2020)
- Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned (ACL2020 WS)
- SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search (EMNLP2020)
- Neural Duplicate Question Detection without Labeled Training Data (EMNLP2019)
- Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection
- Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs
- Cross-lingual Information Retrieval with BERT
- Cross-lingual Retrieval for Iterative Self-Supervised Training (NeurIPS2020)
- Graph-based Multilingual Product Retrieval in E-Commerce Search (NAACL2021 Industry)
- Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning (ECIR2020)
- PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval (WSDM2021)
- B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval (SIGIR2021)
- Condenser: a Pre-training Architecture for Dense Retrieval (EMNLP2021)
- Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation (ACL2022)
- Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval (EMNLP2021 WS) [github]
Generation
- Pretrained Language Models for Text Generation: A Survey (IJCAI2021 Survey Track)
- A Survey of Pretrained Language Models Based Text Generation
- GLGE: A New General Language Generation Evaluation Benchmark [github]
- BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model (NAACL2019 WS)
- Pretraining-Based Natural Language Generation for Text Summarization
- Text Summarization with Pretrained Encoders (EMNLP2019) [github (original)] [github (huggingface)]
- Multi-stage Pretraining for Abstractive Summarization
- PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
- Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models
- GSum: A General Framework for Guided Neural Abstractive Summarization (NAACL2021) [github]
- STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization
- TLDR: Extreme Summarization of Scientific Documents [github]
- Product Title Generation for Conversational Systems using BERT
- WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization (COLING2020)
- Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation
- Abstractive Query Focused Summarization with Query-Free Resources
- Abstractive Summarization of Spoken and Written Instructions with BERT
- Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization (ACL2021)
- Coreference-Aware Dialogue Summarization (SIGDIAL2021)
- XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages (ACL2021 Findings) [github]
- BERT Fine-tuning For Arabic Text Summarization (ICLR2020 WS)
- Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2
- Mixed-Lingual Pre-training for Cross-lingual Summarization (AACL-IJCNLP2020)
- PoinT-5: Pointer Network and T-5 based Financial NarrativeSummarisation (COLING2020 WS)
- MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML2019) [github], [github]
- JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation (LREC2020)
- Unified Language Model Pre-training for Natural Language Understanding and Generation [github] (NeurIPS2019)
- UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [github]
- Dual Inference for Improving Language Understanding and Generation (EMNLP2020 Findings)
- All NLP Tasks Are Generation Tasks: A General Pretraining Framework
- ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training (EMNLP2020 Findings) [github]
- ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation
- Towards Making the Most of BERT in Neural Machine Translation
- Improving Neural Machine Translation with Pre-trained Representation
- BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation (EMNLP2021)
- On the use of BERT for Neural Machine Translation (EMNLP2019 WS)
- Incorporating BERT into Neural Machine Translation (ICLR2020)
- Recycling a Pre-trained BERT Encoder for Neural Machine Translation
- Exploring Unsupervised Pretraining Objectives for Machine Translation (ACL2021 Findings)
- Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT (EMNLP2020)
- Language Models are Good Translators
- Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (EMNLP2019)
- PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation (EMNLP2020)
- ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
- Non-Autoregressive Text Generation with Pre-trained Language Models (EACL2021)
- Cross-Lingual Natural Language Generation via Pre-Training (AAAI2020) [github]
- PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable (ACL2020)
- A Tailored Pre-Training Model for Task-Oriented Dialog Generation
- Pretrained Language Models for Dialogue Generation with Multiple Input Sources (EMNLP2020 Findings)
- Knowledge-Grounded Dialogue Generation with Pre-trained Language Models (EMNLP2020)
- Are Pre-trained Language Models Knowledgeable to Ground Open Domain Dialogues?
- Open-Domain Dialogue Generation Based on Pre-trained Language Models
- LaMDA: Language Models for Dialog Applications
- Retrieval-Augmented Transformer-XL for Close-Domain Dialog Generation
- Internet-Augmented Dialogue Generation
- DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (AAAI2021)
- CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection
- QURIOUS: Question Generation Pretraining for Text Generation
- Few-Shot NLG with Pre-Trained Language Model (ACL2020)
- Text-to-Text Pre-Training for Data-to-Text Tasks
- KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation (EMNLP2020)
- Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference (INLG2020)
- Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
- Structure-Grounded Pretraining for Text-to-SQL
- Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
- ToTTo: A Controlled Table-To-Text Generation Dataset (EMNLP2020) [github]
- Exploring Fluent Query Reformulations with Text-to-Text Transformers and Reinforcement Learning (AAAI2021 WS)
- A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation (TACL2020) [github]
- MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models (EMNLP2020)
- Facts2Story: Controlling Text Generation by Key Facts
- CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning [github] [website] (EMNLP2020 Findings)
- An Enhanced Knowledge Injection Model for Commonsense Generation (COLING2020)
- Retrieval Enhanced Model for Commonsense Generation (ACL2021 Findings)
- Lexically-constrained Text Generation through Commonsense Knowledge Extraction and Injection (AAAI2021WS)
- Pre-training Text-to-Text Transformers for Concept-centric Common Sense
- Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph (EMNLP2020)
- KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning
- Autoregressive Entity Retrieval (ICLR2021) [github]
- Multilingual Autoregressive Entity Linking
- EIGEN: Event Influence GENeration using Pre-trained Language Models
- proScript: Partially Ordered Scripts Generation via Pre-trained Language Models
- Goal-Oriented Script Construction (INLG2021)
- Contrastive Triple Extraction with Generative Transformer (AAAI2021)
- GeDi: Generative Discriminator Guided Sequence Generation
- Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation (EMNLP2020)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (JMLR2020) [github]
- mT5: A massively multilingual pre-trained text-to-text transformer (NAACL2021) [github]
- nmT5 -- Is parallel data still relevant for pre-training massively multilingual language models? (ACL2021)
- mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
- WT5?! Training Text-to-Text Models to Explain their Predictions
- NT5?! Training T5 to Perform Numerical Reasoning [github]
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (ACL2020)
- The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
- GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
- Finetuned Language Models Are Zero-Shot Learners [blog]
- Multitask Prompted Training Enables Zero-Shot Task Generalization
- Multilingual Denoising Pre-training for Neural Machine Translation
- Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data (COLING2020)
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- Unsupervised Pre-training for Natural Language Generation: A Literature Review
Quality evaluator
- BERTScore: Evaluating Text Generation with BERT (ICLR2020)
- BERTTune: Fine-Tuning Neural Machine Translation with BERTScore (ACL2021)
- Machine Translation Evaluation with BERT Regressor
- TransQuest: Translation Quality Estimation with Cross-lingual Transformers (COLING2020)
- SumQE: a BERT-based Summary Quality Estimation Model (EMNLP2019)
- MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance (EMNLP2019) [github]
- BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward
- Language Model Augmented Relevance Score (ACL2021)
- BLEURT: Learning Robust Metrics for Text Generation (ACL2020)
- BARTScore: Evaluating Generated Text as Text Generation [github]
- Masked Language Model Scoring (ACL2020)
- Simple-QE: Better Automatic Quality Estimation for Text Simplification
Modification (multi-task, masking strategy, etc.)
- Multi-Task Deep Neural Networks for Natural Language Understanding (ACL2019)
- The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
- BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (ICML2019)
- Measuring Massive Multitask Language Understanding (ICLR2021) [github]
- Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks (ACL2021)
- Pre-training Text Representations as Meta Learning
- Unifying Question Answering and Text Classification via Span Extraction
- MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization (ACL2020)
- ERNIE: Enhanced Language Representation with Informative Entities (ACL2019)
- ERNIE: Enhanced Representation through Knowledge Integration
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (AAAI2020)
- ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
- ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding
- XLNet: Generalized Autoregressive Pretraining for Language Understanding (NeurIPS2019) [github]
- MPNet: Masked and Permuted Pre-training for Language Understanding
- Pre-Training with Whole Word Masking for Chinese BERT
- SpanBERT: Improving Pre-training by Representing and Predicting Spans (TACL2020) [github]
- ConvBERT: Improving BERT with Span-based Dynamic Convolution
- Frustratingly Simple Pretraining Alternatives to Masked Language Modeling (EMNLP2021) [github]
- TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning (NAACL2022)
- ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations (EMNLP2020 Findings)
- ZEN 2.0: Continue Training and Adaption for N-gram Enhanced Text Encoders
- MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining
- Adversarial Training for Large Neural Language Models
- BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks (ACL2021)
- Train No Evil: Selective Masking for Task-guided Pre-training
- Position Masking for Language Models
- Masking as an Efficient Alternative to Finetuning for Pretrained Language Models (EMNLP2020)
- Variance-reduced Language Pretraining via a Mask Proposal Network
- Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation (EMNLP2020)
- Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model
- Contextual Representation Learning beyond Masked Language Modeling (ACL2022)
- Curriculum learning for language modeling
- Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
- Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training
- Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference (EACL2021) [github]
- It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners (NAACL2021) [github]
- Making Pre-trained Language Models Better Few-shot Learners (ACL2021) [github]
- CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
- Lifelong Learning of Few-shot Learners across NLP Tasks
- Don't Stop Pretraining: Adapt Language Models to Domains and Tasks (ACL2020)
- Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora (NAACL2022)
- Towards Continual Knowledge Learning of Language Models (ICLR2022)
- An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training [github]
- To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks (ACL2020)
- Revisiting Few-sample BERT Fine-tuning
- Blank Language Models
- Enabling Language Models to Fill in the Blanks (ACL2020)
- Efficient Training of BERT by Progressively Stacking (ICML2019) [github]
- RoBERTa: A Robustly Optimized BERT Pretraining Approach [github]
- On Losses for Modern Language Models (EMNLP2020) [github]
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR2020)
- Rethinking Embedding Coupling in Pre-trained Language Models (ICLR2021)
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR2020) [github] [blog]
- Training ELECTRA Augmented with Multi-word Selection (ACL2021 Findings)
- Learning to Sample Replacements for ELECTRA Pre-Training (ACL2021 Findings)
- SCRIPT: Self-Critic PreTraining of Transformers (NAACL2021)
- Pre-Training Transformers as Energy-Based Cloze Models (EMNLP2020) [github]
- MC-BERT: Efficient Language Pre-Training via a Meta Controller
- FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR2020)
- KERMIT: Generative Insertion-Based Modeling for Sequences
- CALM: Continuous Adaptive Learning for Language Modeling
- SegaBERT: Pre-training of Segment-aware BERT for Language Understanding
- DisSent: Sentence Representation Learning from Explicit Discourse Relations (ACL2019)
- Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models (ACL2020)
- CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
- SLM: Learning a Discourse Language Representation with Sentence Unshuffling (EMNLP2020)
- CausalBERT: Injecting Causal Knowledge Into Pre-trained Models with Minimal Supervision
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR2020)
- Structural Pre-training for Dialogue Comprehension (ACL2021)
- Retrofitting Structure-aware Transformer Language Model for End Tasks (EMNLP2020)
- Syntax-Enhanced Pre-trained Model
- Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
- Do Syntax Trees Help Pre-trained Transformers Extract Information?
- SenseBERT: Driving Some Sense into BERT
- Semantics-aware BERT for Language Understanding (AAAI2020)
- GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method
- K-BERT: Enabling Language Representation with Knowledge Graph
- Knowledge Enhanced Contextual Word Representations (EMNLP2019)
- Knowledge-Aware Language Model Pretraining
- K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
- JAKET: Joint Pre-training of Knowledge Graph and Language Understanding
- E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT (EMNLP2020)
- KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
- Entities as Experts: Sparse Memory Access with Entity Supervision (EMNLP2020)
- Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning (EMNLP2020)
- Contextualized Representations Using Textual Encyclopedic Knowledge
- CoLAKE: Contextualized Language and Knowledge Embedding (COLING2020)
- KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding
- K-XLNet: A General Method for Combining Explicit Knowledge with Language Model Pretraining
- Combining pre-trained language models and structured knowledge
- Coarse-to-Fine Pre-training for Named Entity Recognition (EMNLP2020)
- E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks (COLING2020 WS)
- REALM: Retrieval-Augmented Language Model Pre-Training (ICML2020) [github]
- Simple and Efficient ways to Improve REALM
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (NeurIPS2020)
- Fine-tune the Entire RAG Architecture (including DPR retriever) for Question-Answering
- Joint Retrieval and Generation Training for Grounded Text Generation
- Retrieval Augmentation Reduces Hallucination in Conversation
- On-The-Fly Information Retrieval Augmentation for Language Models
- Current Limitations of Language Models: What You Need is Retrieval
- Improving language models by retrieving from trillions of tokens [blog] [blog]
- Taking Notes on the Fly Helps BERT Pre-training
- Pre-training via Paraphrasing
- SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis (ACL2020)
- Improving Event Duration Prediction via Time-aware Pre-training (EMNLP2020 Findings)
- Knowledge-Aware Procedural Text Understanding with Multi-Stage Training
- Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (ICLR2020)
- Rethinking Positional Encoding in Language Pre-training
- Improve Transformer Models with Better Relative Position Embeddings (EMNLP2020 Findings)
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- Position Information in Transformers: An Overview
- BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks
- BURT: BERT-inspired Universal Representation from Twin Structure
- Universal Text Representation from BERT: An Empirical Study
- Symmetric Regularization based BERT for Pair-wise Semantic Reasoning (SIGIR2020)
- Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching
- Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling (ACL2021)
- Transfer Fine-Tuning: A BERT Case Study (EMNLP2019)
- Improving Pre-Trained Multilingual Models with Vocabulary Expansion (CoNLL2019)
- BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance (ACL2020)
- A Mixture of h−1 Heads is Better than h Heads (ACL2020)
- SesameBERT: Attention for Anywhere
- Multi-Head Attention: Collaborate Instead of Concatenate
- DeBERTa: Decoding-enhanced BERT with Disentangled Attention [github]
- Deepening Hidden Representations from Pre-trained Language Models
- On the Transformer Growth for Progressive BERT Training
- Improving BERT with Self-Supervised Attention
- Guiding Attention for Self-Supervised Learning with Transformers (EMNLP2020 Findings)
- Improving Disfluency Detection by Self-Training a Self-Attentive Model
- Self-training Improves Pre-training for Natural Language Understanding [github]
- CERT: Contrastive Self-supervised Learning for Language Understanding
- Robust Transfer Learning with Pretrained Language Models through Adapters (ACL2021)
- ReadOnce Transformers: Reusable Representations of Text for Transformers (ACL2021)
- LV-BERT: Exploiting Layer Variety for BERT (ACL2021 Findings) [github]
- Large Product Key Memory for Pretrained Language Models (EMNLP2020 Findings)
- Enhancing Pre-trained Language Model with Lexical Simplification
- Contextual BERT: Conditioning the Language Model Using a Global State (COLING2020 WS)
- SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization (ACL2020)
- Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning (EMNLP2021) [github]
- Token Dropping for Efficient BERT Pretraining (ACL2022) [github]
- Pay Attention to MLPs
- Are Pre-trained Convolutions Better than Pre-trained Transformers? (ACL2021)
- Pre-Training a Language Model Without Human Language
Tokenization
- Training Multilingual Pre-trained Language Model with Byte-level Subwords
- Byte Pair Encoding is Suboptimal for Language Model Pretraining (EMNLP2020 Findings)
- CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation (TACL2022) [github]
- ByT5: Towards a token-free future with pre-trained byte-to-byte models (TACL2022) [github]
- Multi-view Subword Regularization (NAACL2021)
- Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation (ACL2021)
- An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks (AACL-IJCNLP2020)
- AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
- LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization (ACL2021 Findings)
- Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models (NAACL2021)
- CharBERT: Character-aware Pre-trained Language Model (COLING2020) [github]
- CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters (COLING2020)
- Charformer: Fast Character Transformers via Gradient-based Subword Tokenization [github]
- Fast WordPiece Tokenization (EMNLP2021)
- MaxMatch-Dropout: Subword Regularization for WordPiece (COLING2022)
Prompt
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts (EMNLP2020) [github]
- Calibrate Before Use: Improving Few-Shot Performance of Language Models
- Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
- GPT Understands, Too [github]
- How Many Data Points is a Prompt Worth? (NAACL2021) [website]
- Learning How to Ask: Querying LMs with Mixtures of Soft Prompts (NAACL2021)
- Meta-tuning Language Models to Answer Prompts Better
- Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
- The Power of Scale for Parameter-Efficient Prompt Tuning (EMNLP2021)
- Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
- PPT: Pre-trained Prompt Tuning for Few-shot Learning
- True Few-Shot Learning with Language Models
- Few-shot Sequence Learning with Transformers (NeurIPS2020 WS)
- PTR: Prompt Tuning with Rules for Text Classification
- Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification
- Discrete and Soft Prompting for Multilingual Models (EMNLP2021)
- Reframing Instructional Prompts to GPTk's Language
- Multimodal Few-Shot Learning with Frozen Language Models
- FLEX: Unifying Evaluation for Few-Shot NLP
- Do Prompt-Based Models Really Understand the Meaning of their Prompts?
- OpenPrompt: An Open-source Framework for Prompt-learning (ACL2022 Demo)
Sentence embedding
- Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP2019)
- Parameter-free Sentence Embedding via Orthogonal Basis (EMNLP2019)
- SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models
- On the Sentence Embeddings from Pre-trained Language Models (EMNLP2020)
- Semantic Re-tuning with Contrastive Tension (ICLR2021)
- DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations (ACL2021)
- ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer (ACL2021)
- CLEAR: Contrastive Learning for Sentence Representation
- SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP2021) [github]
- ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding
- Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders (EMNLP2021) [github]
- TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning (EMNLP2021 Findings)
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations [github]
- Whitening Sentence Representations for Better Semantics and Faster Retrieval [github]
- Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks (NAACL2021)
- DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings (NAACL2022) [code]
- Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives (AAAI2022) [github]
- Sentence Embeddings by Ensemble Distillation
- EASE: Entity-Aware Contrastive Learning of Sentence Embedding (NAACL2022)
- Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
- Dual-View Distilled BERT for Sentence Embedding (SIGIR2021)
- DefSent: Sentence Embeddings using Definition Sentences (ACL2021)
- Paraphrastic Representations at Scale [github]
- Learning Dense Representations of Phrases at Scale (ACL2021) [github]
- Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration (EMNLP2021)
Transformer variants
- Efficient Transformers: A Survey
- Adaptive Attention Span in Transformers (ACL2019)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL2019) [github]
- Generating Long Sequences with Sparse Transformers
- Do Transformers Need Deep Long-Range Memory (ACL2020)
- DA-Transformer: Distance-aware Transformer (NAACL2021)
- Adaptively Sparse Transformers (EMNLP2019)
- Compressive Transformers for Long-Range Sequence Modelling
- The Evolved Transformer (ICML2019)
- Reformer: The Efficient Transformer (ICLR2020) [github]
- GRET: Global Representation Enhanced Transformer (AAAI2020)
- GMAT: Global Memory Augmentation for Transformers
- Memory Transformer
- Transformer on a Diet [github]
- A Tensorized Transformer for Language Modeling (NeurIPS2019)
- DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling (ICLR2020) [github]
- DeLighT: Very Deep and Light-weight Transformer [github]
- Lite Transformer with Long-Short Range Attention [github] (ICLR2020)
- Efficient Content-Based Sparse Attention with Routing Transformers
- BP-Transformer: Modelling Long-Range Context via Binary Partitioning
- Longformer: The Long-Document Transformer [github]
- Big Bird: Transformers for Longer Sequences
- Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (AAAI2021)
- Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AAAI2021) [github]
- Improving Transformer Models by Reordering their Sublayers (ACL2020)
- Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
- Mask Attention Networks: Rethinking and Strengthen Transformer (NAACL2021)
- Synthesizer: Rethinking Self-Attention in Transformer Models
- Query-Key Normalization for Transformers (EMNLP2020 Findings)
- Rethinking Attention with Performers (ICLR2021)
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
- HAT: Hardware-Aware Transformers for Efficient Natural Language Processing (ACL2020) [github]
- Linformer: Self-Attention with Linear Complexity
- What's Hidden in a One-layer Randomly Weighted Transformer? (EMNLP2021)
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- Understanding the Difficulty of Training Transformers (EMNLP2020)
- Towards Fully 8-bit Integer Inference for the Transformer Model (IJCAI2020)
- Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
- Long Range Arena: A Benchmark for Efficient Transformers
Probe
- A Structural Probe for Finding Syntax in Word Representations (NAACL2019)
- When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions
- Finding Universal Grammatical Relations in Multilingual BERT (ACL2020)
- Probing Multilingual BERT for Genetic and Typological Signals (COLING2020)
- Linguistic Knowledge and Transferability of Contextual Representations (NAACL2019) [github]
- Probing What Different NLP Tasks Teach Machines about Function Word Comprehension (*SEM2019)
- BERT Rediscovers the Classical NLP Pipeline (ACL2019)
- A Closer Look at How Fine-tuning Changes BERT (ACL2022)
- Mediators in Determining what Processing BERT Performs First (NAACL2021)
- Probing Neural Network Comprehension of Natural Language Arguments (ACL2019)
- Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
- What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
- Quantity doesn't buy quality syntax with neural language models (EMNLP2019)
- Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction (ICLR2020)
- Discourse Probing of Pretrained Language Models (NAACL2021)
- oLMpics -- On what Language Model Pre-training Captures
- Do Neural Language Models Show Preferences for Syntactic Formalisms? (ACL2020)
- Probing for Predicate Argument Structures in Pretrained Language Models (ACL2022)
- Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT (ACL2020)
- Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work? (ACL2020)
- Probing Linguistic Systematicity (ACL2020)
- A Matter of Framing: The Impact of Linguistic Formalism on Probing Results
- A Cross-Task Analysis of Text Span Representations (ACL2020 WS)
- When Do You Need Billions of Words of Pretraining Data? [github]
- Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis
- Language Models as Knowledge Bases? (EMNLP2019) [github]
- BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA
- How Much Knowledge Can You Pack Into the Parameters of a Language Model? (EMNLP2020)
- Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries (EACL2021)
- Factual Probing Is [MASK]: Learning vs. Learning to Recall (NAACL2021) [github]
- Knowledge Neurons in Pretrained Transformers
- DirectProbe: Studying Representations without Classifiers (NAACL2021)
- The Language Model Understood the Prompt was Ambiguous: Probing Syntactic Uncertainty Through Generation (EMNLP2021 WS)
- X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models (EMNLP2020)
- Probing BERT in Hyperbolic Spaces (ICLR2021)
- Probing Across Time: What Does RoBERTa Know and When?
- Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)
- Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models [github] [website]
- Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly (ACL2020)
- How is BERT surprised? Layerwise detection of linguistic anomalies (ACL2021)
- Exploring the Role of BERT Token Representations to Explain Sentence Probing Results
- What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge
- A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
- Can BERT Reason? Logically Equivalent Probes for Evaluating the Inference Capabilities of Language Models
- Probing Task-Oriented Dialogue Representation from Language Models (EMNLP2020)
- Probing for Bridging Inference in Transformer Language Models
- BERTering RAMS: What and How Much does BERT Already Know About Event Arguments? -- A Study on the RAMS Dataset (EMNLP2020 WS)
- CxGBERT: BERT meets Construction Grammar (COLING2020) [github]
- BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? (ACL2021)
Inside BERT
- What does BERT learn about the structure of language? (ACL2019)
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ACL2019) [github]
- Multi-head or Single-head? An Empirical Comparison for Transformer Training
- Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)
- Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)
- What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)
- Do Attention Heads in BERT Track Syntactic Dependencies?
- Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)
- Inducing Syntactic Trees from BERT Representations (ACL2019 WS)
- A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)
- Visualizing and Measuring the Geometry of BERT
- How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP2019)
- Are Sixteen Heads Really Better than One? (NeurIPS2019)
- On the Validity of Self-Attention as Explanation in Transformer Models
- Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)
- Attention Interpretability Across NLP Tasks
- Revealing the Dark Secrets of BERT (EMNLP2019)
- Analyzing Redundancy in Pretrained Transformer Models (EMNLP2020)
- What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models
- Attention Module is Not Only a Weight: Analyzing Transformers with Vector Norms (ACL2020 SRW)
- Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP2021)
- Quantifying Attention Flow in Transformers
- Telling BERT's full story: from Local Attention to Global Aggregation (EACL2021)
- How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT′s Attention
- Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks (ACL2021)
- What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding (EMNLP2020)
- Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)
- Are Pretrained Language Models Symbolic Reasoners Over Knowledge? (CoNLL2020)
- Rethinking the Value of Transformer Components (COLING2020)
- Transformer Feed-Forward Layers Are Key-Value Memories
- Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
- Investigating Transferability in Pretrained Language Models
- What Happens To BERT Embeddings During Fine-tuning?
- Analyzing Individual Neurons in Pre-trained Language Models (EMNLP2020)
- How fine can fine-tuning be? Learning efficient language models (AISTATS2020)
- The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)
- A Primer in BERTology: What we know about how BERT works (TACL2020)
- Pretrained Language Model Embryology: The Birth of ALBERT (EMNLP2020) [github]
- Evaluating Saliency Methods for Neural Language Models (NAACL2021)
- Investigating Gender Bias in BERT
- Measuring and Reducing Gendered Correlations in Pre-trained Models [website]
- Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias (COLING2020 WS)
- Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models (EACL2021)
- CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models (EMNLP2020)
- Unmasking the Mask -- Evaluating Social Biases in Masked Language Models
- BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking Scalar Adjectives with Contextualised Representations (EMNLP2020)
- Does Chinese BERT Encode Word Structure? (COLING2020) [github]
- How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)
- Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering
- What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
- What do Models Learn from Question Answering Datasets?
- Towards Interpreting BERT for Reading Comprehension Based QA (EMNLP2020)
- Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA (EMNLP2020)
- How does BERT’s attention change when you fine-tune? An analysis methodology and a case study in negation scope (ACL2020)
- Calibration of Pre-trained Transformers
- When BERT Plays the Lottery, All Tickets Are Winning (EMNLP2020)
- The Lottery Ticket Hypothesis for Pre-trained BERT Networks
- What Context Features Can Transformer Language Models Use? (ACL2021)
- exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]
- The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models [github]
- What Does BERT with Vision Look At? (ACL2020)
- [Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models]() (ECCV2020)
- Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers (TACL2021)
- What Vision-Language Models ‘See’ when they See Scenes
Multi-lingual
- A Primer on Pretrained Multilingual Language Models
- Multilingual Constituency Parsing with Self-Attention and Pre-Training (ACL2019)
- Cross-lingual Language Model Pretraining (NeurIPS2019) [github]
- XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
- XLM-K: Improving Cross-Lingual Language Model Pre-Training with Multilingual Knowledge
- 75 Languages, 1 Model: Parsing Universal Dependencies Universally (EMNLP2019) [github]
- Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations (EMNLP2019 WS)
- Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank (EMNLP2020 Findings)
- Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (EMNLP2019)
- How multilingual is Multilingual BERT? (ACL2019)
- How Language-Neutral is Multilingual BERT?
- How to Adapt Your Pretrained Multilingual Model to 1600 Languages (ACL2021)
- Load What You Need: Smaller Versions of Multilingual BERT (EMNLP2020) [github]
- Is Multilingual BERT Fluent in Language Generation?
- ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation (ACL2021 Findings)
- Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks (EMNLP2019)
- BERT is Not an Interlingua and the Bias of Tokenization (EMNLP2019 WS)
- Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR2020)
- Multilingual Alignment of Contextual Word Representations (ICLR2020)
- Emerging Cross-lingual Structure in Pretrained Language Models (ACL2020)
- On the Cross-lingual Transferability of Monolingual Representations
- Unsupervised Cross-lingual Representation Learning at Scale (ACL2020)
- FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
- Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study (EMNLP2020 Findings)
- Emerging Cross-lingual Structure in Pretrained Language Models
- Can Monolingual Pretrained Models Help Cross-Lingual Classification?
- A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT
- Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data (CoNLL2019)
- What the [MASK]? Making Sense of Language-Specific BERT Models
- XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization (ICML2020)
- XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation (EMNLP2021)
- XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
- A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages
- Extending Multilingual BERT to Low-Resource Languages
- Learning Better Universal Representations from Pre-trained Contextualized Language Models
- Universal Dependencies according to BERT: both more specific and more general
- A Call for More Rigor in Unsupervised Cross-lingual Learning (ACL2020)
- Identifying Necessary Elements for BERT's Multilinguality (EMNLP2020)
- MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer
- From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers
- Language Models are Few-shot Multilingual Learners
- First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT (EACL2021)
- Multilingual BERT Post-Pretraining Alignment (NAACL2021)
- XeroAlign: Zero-Shot Cross-lingual Transformer Alignment (ACL2021 Findings)
- Syntax-augmented Multilingual BERT for Cross-lingual Transfer (ACL2021)
- Language Representation in Multilingual BERT and its applications to improve Cross-lingual Generalization
- VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation
- On the Language Neutrality of Pre-trained Multilingual Representations
- Are All Languages Created Equal in Multilingual BERT? (ACL2020 WS)
- When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
- Adapting Monolingual Models: Data can be Scarce when Language Similarity is High (ACL2021 Findings)
- Language-agnostic BERT Sentence Embedding
- Universal Sentence Representation Learning with Conditional Masked Language Model
- WikiBERT models: deep transfer learning for many languages
- Inducing Language-Agnostic Multilingual Representations
- To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding? (COLING2020)
- It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT (EMNLP2020 WS)
- XLM-T: A Multilingual Language Model Toolkit for Twitter
- A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
- Translation Artifacts in Cross-lingual Transfer Learning (EMNLP2020)
- Identifying Cultural Differences through Multi-Lingual Wikipedia
- A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT (EMNLP2020)
- BERT for Monolingual and Cross-Lingual Reverse Dictionary (EMNLP2020 Findings)
- Bilingual Text Extraction as Reading Comprehension
- Evaluating Multilingual BERT for Estonian
- How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models (ACL2021) [github]
- Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training (EMNLP2021)
- BERTologiCoMix: How does Code-Mixing interact with Multilingual BERT? (EACL2021 WS)
Other than English models
- CamemBERT: a Tasty French Language Model (ACL2020)
- On the importance of pre-training data volume for compact language models (EMNLP2020)
- FlauBERT: Unsupervised Language Model Pre-training for French (LREC2020)
- Multilingual is not enough: BERT for Finnish
- BERTje: A Dutch BERT Model
- RobBERT: a Dutch RoBERTa-based Language Model
- Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language
- RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark (EMNLP2020)
- AraBERT: Transformer-based Model for Arabic Language Understanding
- ALUE: Arabic Language Understanding Evaluation (EACL2021 WS) [website]
- ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic (ACL2021) [github]
- Pre-Training BERT on Arabic Tweets: Practical Considerations
- PhoBERT: Pre-trained language models for Vietnamese
- Give your Text Representation Models some Love: the Case for Basque (LREC2020)
- ParsBERT: Transformer-based Model for Persian Language Understanding
- Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization (CSICC2021)
- Pre-training Polish Transformer-based Language Models at Scale
- Playing with Words at the National Library of Sweden -- Making a Swedish BERT
- KR-BERT: A Small-Scale Korean-Specific Language Model
- KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding (ICPR2020)
- What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers (EMNLP2021)
- KLUE: Korean Language Understanding Evaluation
- WangchanBERTa: Pretraining transformer-based Thai Language Models
- FinEst BERT and CroSloEngual BERT: less is more in multilingual models (TSD2020)
- GREEK-BERT: The Greeks visiting Sesame Street (SETN2020)
- The birth of Romanian BERT (EMNLP2020 Findings)
- German's Next Language Model (COLING2020 Industry Truck)
- GottBERT: a pure German Language Model
- EstBERT: A Pretrained Language-Specific BERT for Estonian
- Czert -- Czech BERT-like Model for Language Representation
- RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model (TSD2021)
- Bertinho: Galician BERT Representations
- Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets
- PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data
- IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages (EMNLP2020 Findings)
- Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages (NeurIPS2020 WS)
- IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP (COLING2020)
- IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization (EMNLP2021)
- IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation (EMNLP2021)
- AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages (EMNLP2021)
- KinyaBERT: a Morphology-aware Kinyarwanda Language Model (ACL2022)
- BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
- NEZHA: Neural Contextualized Representation for Chinese Language Understanding
- Revisiting Pre-Trained Models for Chinese Natural Language Processing (EMNLP2020 Findings)
- ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information (ACL2021) [github]
- Intrinsic Knowledge Evaluation on Chinese Language Models
- CPM: A Large-scale Generative Chinese Pre-trained Language Model [github]
- PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
- CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
- CLUE: A Chinese Language Understanding Evaluation Benchmark
- CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark
- FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
- AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and Generation
- UER: An Open-Source Toolkit for Pre-training Models (EMNLP2019 Demo) [github]
Domain specific
- AMMU -- A Survey of Transformer-based Biomedical Pretrained Language Models
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining
- Self-Alignment Pretraining for Biomedical Entity Representations (NAACL2021) [github]
- Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking (ACL2021) [github]
- Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (ACL2019 WS)
- BERT-based Ranking for Biomedical Entity Normalization
- PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP2019)
- Pre-trained Language Model for Biomedical Question Answering
- How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering
- On Adversarial Examples for Biomedical NLP Tasks
- An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining (ACL2020 WS)
- Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [github]
- Improving Biomedical Pretrained Language Models with Knowledge (BioNLP2021)
- BioMegatron: Larger Biomedical Domain Language Model (EMNLP2020) [website]
- Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art (EMNLP2020 WS)
- A pre-training technique to localize medical BERT and enhance BioBERT [github]
- exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources [github] (EMNLP2020 Findings)
- BERTology Meets Biology: Interpreting Attention in Protein Language Models (ICLR2021)
- ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
- Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks (AIME2020)
- Publicly Available Clinical BERT Embeddings (NAACL2019 WS)
- UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus (NAACL2021)
- MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning
- A clinical specific BERT developed with huge size of Japanese clinical narrative
- Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset (ACL2020) [github]
- Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources
- Classifying Long Clinical Documents with Pre-trained Transformers
- Detecting Adverse Drug Reactions from Twitter through Domain-Specific Preprocessing and BERT Ensembling
- Progress Notes Classification and Keyword Extraction using Attention-based Deep Learning Models with BERT
- BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining
- Prediction of ICD Codes with Clinical BERT Embeddings and Text Augmentation with Label Balancing using MIMIC-III
- Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition (EMNLP2020)
- CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT (EMNLP2020)
- Students Need More Attention: BERT-based Attention Model for Small Data with Application to Automatic Patient Message Triage (MLHC2020)
- Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction [github]
- SciBERT: Pretrained Contextualized Embeddings for Scientific Text (EMNLP2019) [github]
- SPECTER: Document-level Representation Learning using Citation-informed Transformers (ACL2020) [github]
- OAG-BERT: Pre-train Heterogeneous Entity-augmented Academic Language Models [github]
- PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model
- FinBERT: A Pretrained Language Model for Financial Communications
- LEGAL-BERT: The Muppets straight out of Law School (EMNLP2020 Findings)
- Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
- E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce
- BERT Goes Shopping: Comparing Distributional Models for Product Representations
- NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application
- Code and Named Entity Recognition in StackOverflow (ACL2020) [github]
- BERTweet: A pre-trained language model for English Tweets (EMNLP2020 Demo)
- TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis
- A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks
- Analyzing COVID-19 Tweets with Transformer-based Language Models
- Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media (EMNLP2020 Findings)
Multi-modal
- A Survey on Visual Transformer
- Transformers in Vision: A Survey
- Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
- VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)
- VisualBERT: A Simple and Performant Baseline for Vision and Language
- Selfie: Self-supervised Pretraining for Image Embedding
- ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
- SimVLM: Simple Visual Language Model Pretraining with Weak Supervision (ICLR2022)
- Align before Fuse: Vision and Language Representation Learning with Momentum Distillation (NeurIPS2021) [github]
- Contrastive Bidirectional Transformer for Temporal Representation Learning
- M-BERT: Injecting Multimodal Information in the BERT Structure
- Integrating Multimodal Information in Large Pretrained Transformers
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)
- Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions (NAACL2021)
- X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers (EMNLP2020)
- Adaptive Transformers for Learning Multimodal Representations (ACL2020SRW) [github]
- GEM: A General Evaluation Benchmark for Multimodal Tasks (ACL2021 Findings) [github]
- Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)
- VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
- LambdaNetworks: Modeling long-range Interactions without Attention [github]
- BERT representations for Video Question Answering (WACV2020)
- Self-supervised pre-training and contrastive representation learning for multiple-choice video QA (AAAI2021)
- UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning (ACL2021)
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [github]
- Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
- Contrastive Visual-Linguistic Pretraining
- What is More Likely to Happen Next? Video-and-Language Future Event Prediction (EMNLP2020)
- VisualGPT: Data-efficient Image Captioning by Balancing Visual Input and Linguistic Knowledge from Pretraining
- XGPT: Cross-modal Generative Pre-Training for Image Captioning
- Scaling Up Vision-Language Pre-training for Image Captioning
- Injecting Semantic Concepts into End-to-End Image Captioning (CVPR2022)
- Unified Vision-Language Pre-Training for Image Captioning and VQA (AAAI2020) [github]
- TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
- An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA (AAAI2022)
- Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
- VisualCOMET: Reasoning about the Dynamic Context of a Still Image (ECCV2020) [website]
- Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
- VD-BERT: A Unified Vision and Dialog Transformer with BERT (EMNLP2020)
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations (ICLR2020)
- Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
- UNITER: Learning UNiversal Image-TExt Representations
- ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
- Supervised Multimodal Bitransformers for Classifying Images and Text
- InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining
- Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs (TACL2021)
- SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
- LiT : Zero-Shot Transfer with Locked-image Text Tuning (CVPR2022)
- WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
- Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training (NeurIPS2021)
- E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning (ACL2021)
- UNIMO-2: End-to-End Unified Vision-Language Grounded Learning (ACL2022)
- Grounded Language-Image Pre-training [github]
- VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts [github]
- VinVL: Revisiting Visual Representations in Vision-Language Models
- An Empirical Study of Training End-to-End Vision-and-Language Transformers (CVPR2022) [github]
- Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
- UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
- Florence: A New Foundation Model for Computer Vision
- Large-Scale Adversarial Training for Vision-and-Language Representation Learning (NeurIPS2020)
- Flamingo: a Visual Language Model for Few-Shot Learning
- OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models [github]
- Do DALL-E and Flamingo Understand Each Other?
- Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
- Unifying Vision-and-Language Tasks via Text Generation
- Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network (AAAI2021)
- ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
- KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
- A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
- Self-Supervised learning with cross-modal transformers for emotion recognition (SLT2020)
- Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision (EMNLP2020)
- 12-in-1: Multi-Task Vision and Language Representation Learning
- Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models (NAACL2021)
- M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training (CVPR2021)
- UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
- CM3: A Causal Masked Multimodal Model of the Internet
- Retrieval-Augmented Multimodal Language Modeling
- Cycle Text-To-Image GAN with BERT
- Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
- Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
- VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
- DeVLBert: Learning Deconfounded Visio-Linguistic Representations (ACMMM2020)
- A Recurrent Vision-and-Language BERT for Navigation
- BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations
- Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning (CVPR2021)
- Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers (EMNLP2021)
- Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
- IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
- Understanding Advertisements with BERT (ACL2020)
- BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer
- FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval (SIGIR2020)
- Kaleido-BERT: Vision-Language Pre-training on Fashion Domain (CVPR2021)
- LayoutLM: Pre-training of Text and Layout for Document Image Understanding (KDD2020) [github]
- LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding (ACL2021)
- LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
- Unifying Vision, Text, and Layout for Universal Document Processing
- LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding
- BROS: A Pre-trained Language Model for Understanding Texts in Document
- TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
- LayoutReader: Pre-training of Text and Layout for Reading Order Detection (EMNLP2021)
- BERT for Large-scale Video Segment Classification with Test-time Augmentation (ICCV2019WS)
- Is Space-Time Attention All You Need for Video Understanding?
- lamBERT: Language and Action Learning Using Multimodal BERT
- Generative Pretraining from Pixels [github] [website]
- Visual Transformers: Token-based Image Representation and Processing for Computer Vision
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ICLR2021)
- BEiT: BERT Pre-Training of Image Transformers
- Zero-Shot Text-to-Image Generation [github] [website]
- Hierarchical Text-Conditional Image Generation with CLIP Latents [website]
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [website]
- Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
- Learning Transferable Visual Models From Natural Language Supervision [github] [website]
- How Much Can CLIP Benefit Vision-and-Language Tasks?
- EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
- e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
- Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
- Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation (ACL2022)
- StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
- Training Vision Transformers for Image Retrieval
- LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval (NAACL2021)
- Colorization Transformer (ICLR2021) [github]
- A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer [website]
- Multimodal Pretraining for Dense Video Captioning (AACL-IJCNLP2020)
- Is Space-Time Attention All You Need for Video Understanding?
- Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling (CVPR2021) [github]
- VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding (ACL2021 Findings)
- VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding (EMNLP2021)
- BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
- A Generalist Agent [website]
- SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering
- An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering
- vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
- Effectiveness of self-supervised pre-training for speech recognition
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Applying wav2vec2.0 to Speech Recognition in various low-resource languages
- Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
- Speech Recognition by Simply Fine-tuning BERT (ICASSP2021)
- Understanding Semantics from Speech Through Pre-training
- Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks
- Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision (ICML2020 WS)
- Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
- ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding
- End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features
- Speech-language Pre-training for End-to-end Spoken Language Understanding
- Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding (Interspeech2020)
- AudioCLIP: Extending CLIP to Image, Text and Audio
- Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation
- Unsupervised Cross-lingual Representation Learning for Speech Recognition
- Curriculum Pre-training for End-to-End Speech Translation (ACL2020)
- MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
- Multilingual Speech Translation with Efficient Finetuning of Pretrained Models (ACL2021)
- Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners
- Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models
- To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer's Disease Detection (Interspeech2020)
- BERT for Joint Multichannel Speech Dereverberation with Spatial-aware Tasks
Model compression
- Compression of Deep Learning Models for Text: A Survey
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- Patient Knowledge Distillation for BERT Model Compression (EMNLP2019)
- Small and Practical BERT Models for Sequence Labeling (EMNLP2019)
- TinyBERT: Distilling BERT for Natural Language Understanding [github]
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS2019 WS) [github]
- Contrastive Distillation on Intermediate Representations for Language Model Compression (EMNLP2020)
- Knowledge Distillation from Internal Representations (AAAI2020)
- Reinforced Multi-Teacher Selection for Knowledge Distillation (AAAI2021)
- ALP-KD: Attention-Based Layer Projection for Knowledge Distillation (AAAI2021)
- Dynamic Knowledge Distillation for Pre-trained Language Models (EMNLP2021)
- Distilling Linguistic Context for Language Model Compression (EMNLP2021)
- Improving Task-Agnostic BERT Distillation with Layer Mapping Search
- PoWER-BERT: Accelerating BERT inference for Classification Tasks
- WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
- Extremely Small BERT Models from Mixed-Vocabulary Training (EACL2021)
- BERT-of-Theseus: Compressing BERT by Progressive Module Replacing (EMNLP2020)
- Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning (ACL2020 SRW)
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
- Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
- Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
- Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
- Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
- MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices (ACL2020)
- Distilling Knowledge from Pre-trained Language Models via Text Smoothing
- DynaBERT: Dynamic BERT with Adaptive Width and Depth
- Reducing Transformer Depth on Demand with Structured Dropout
- DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference (ACL2020)
- BERT Loses Patience: Fast and Robust Inference with Early Exit [github] [github]
- Accelerating BERT Inference for Sequence Labeling via Early-Exit (ACL2021)
- Elbert: Fast Albert with Confidence-Window Based Early Exit
- RomeBERT: Robust Training of Multi-Exit BERT
- TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference (NAACL2021)
- FastBERT: a Self-distilling BERT with Adaptive Inference Time (ACL2020)
- Distilling Large Language Models into Tiny and Effective Students using pQRNN
- Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation
- LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression (COLING2020)
- Poor Man's BERT: Smaller and Faster Transformer Models
- schuBERT: Optimizing Elements of BERT (ACL2020)
- BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance (EMNLP2020) [github]
- One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers (ACL2021 Findings)
- From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression (AAAI2022)
- TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER (ACL2020)
- XtremeDistil: Multi-stage Distillation for Massive Multilingual Models (ACL2020)
- Robustly Optimized and Distilled Training for Natural Language Understanding
- Structured Pruning of Large Language Models
- Movement Pruning: Adaptive Sparsity by Fine-Tuning [github]
- Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning (EMNLP2020 Findings)
- Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior (EMNLP2020 Findings)
- Parameter-Efficient Transfer Learning with Diff Pruning
- FastFormers: Highly Efficient Transformer Models for Natural Language Understanding (EMNLP2020 WS) [github]
- AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models (ACL2021) [github]
- Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains (ACL2021 Findings)
- Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
- AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
- SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
- Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models
- An Approximation Algorithm for Optimal Subarchitecture Extraction [github]
- Structured Pruning of a BERT-based Question Answering Model
- DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering (ACL2020)
- Distilling Knowledge Learned in BERT for Text Generation (ACL2020)
- Distilling the Knowledge of BERT for Sequence-to-Sequence ASR (Interspeech2020)
- Pre-trained Summarization Distillation
- Understanding BERT Rankers Under Distillation (ICTIR2020)
- Simplified TinyBERT: Knowledge Distillation for Document Retrieval
- Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT (ACL2020 WS)
- TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing (ACL2020 Demo)
- TopicBERT for Energy Efficient Document Classification (EMNLP2020 Findings)
- MiniVLM: A Smaller and Faster Vision-Language Model
- Compressing Visual-linguistic Model via Knowledge Distillation
- Playing Lottery Tickets with Vision and Language
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
- Q8BERT: Quantized 8Bit BERT (NeurIPS2019 WS)
- Training with Quantization Noise for Extreme Model Compression (ICLR2021)
- Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing
- BinaryBERT: Pushing the Limit of BERT Quantization (ACL2021)
- I-BERT: Integer-only BERT Quantization
- ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques (AAAI2021)
- TernaryBERT: Distillation-aware Ultra-low Bit BERT (EMNLP2020)
- EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP
- Optimizing Inference Performance of Transformers on CPUs
Large language model
- Language Models are Unsupervised Multitask Learners [github]
- Language Models are Few-Shot Learners (NeurIPS2020) [github]
- Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems
- OPT: Open Pre-trained Transformer Language Models [website]
- GPT-NeoX-20B: An Open-Source Autoregressive Language Model
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [blog]
- Training Compute-Optimal Large Language Models
- PaLM: Scaling Language Modeling with Pathways [blog]
- LLaMA: Open and Efficient Foundation Language Models
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling [github]
- PolyLM: An Open Source Polyglot Large Language Model
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model [blog]
- DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale [github]
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- ZeRO++: Extremely Efficient Collective Communication for Giant Model Training [blog]
- ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Reinforcement learning from human feedback
- Fine-Tuning Language Models from Human Preferences [github] [blog]
- Training language models to follow instructions with human feedback [github] [blog]
- WebGPT: Browser-assisted question-answering with human feedback [blog]
- Improving alignment of dialogue agents via targeted human judgements
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
- Training Language Models with Language Feedback (ACL2022 WS)
- Self-Instruct: Aligning Language Model with Self Generated Instructions [github]
- Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
- ChatGPT: A Meta-Analysis after 2.5 Months
Misc.
- Extracting Training Data from Large Language Models
- Generative Language Modeling for Automated Theorem Proving
- Do you have the right scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods (ACL2020)
- jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models [github]
- Cloze-driven Pretraining of Self-attention Networks
- Learning and Evaluating General Linguistic Intelligence
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (ACL2019 WS)
- Learning to Speak and Act in a Fantasy Text Adventure Game (EMNLP2019)
- A Two-Stage Masked LM Method for Term Set Expansion (ACL2020)
- Cold-start Active Learning through Self-supervised Language Modeling (EMNLP2020)
- Conditional BERT Contextual Augmentation
- Data Augmentation using Pre-trained Transformer Models (AACL-IJCNLP2020) [github]
- Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks (COLING2020)
- GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
- Unsupervised Text Style Transfer with Padded Masked Language Models (EMNLP2020)
- Assessing Discourse Relations in Language Generation from Pre-trained Language Models
- Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (ICLR2020)
- Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes
- IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization (AAAI2021)
- Multi-node Bert-pretraining: Cost-efficient Approach
- How to Train BERT with an Academic Budget
- Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training
- PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management [github]
- 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
- TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
- Efficient Large-Scale Language Model Training on GPU Clusters
- Scaling Laws for Neural Language Models
- Scaling Laws for Autoregressive Generative Modeling
- Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
- The Pile: An 800GB Dataset of Diverse Text for Language Modeling [website]
- Deduplicating Training Data Makes Language Models Better
- Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models (ICLR2020)
- A Mutual Information Maximization Perspective of Language Representation Learning (ICLR2020)
- Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment (AAAI2020)
- Weight Poisoning Attacks on Pre-trained Models (ACL2020)
- BERT-ATTACK: Adversarial Attack Against BERT Using BERT (EMNLP2020)
- BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks (ACL2021 Findings)
- Model Extraction and Adversarial Transferability, Your BERT is Vulnerable! (NAACL2021)
- Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT
- Robust Encodings: A Framework for Combating Adversarial Typos (ACL2020)
- On the Robustness of Language Encoders against Grammatical Errors (ACL2020)
- Evaluating the Robustness of Neural Language Models to Input Perturbations (EMNLP2021)
- Pretrained Transformers Improve Out-of-Distribution Robustness (ACL2020) [github]
- "You are grounded!": Latent Name Artifacts in Pre-trained Language Models (EMNLP2020)
- The Right Tool for the Job: Matching Model and Instance Complexities (ACL2020) [github]
- Unsupervised Domain Clusters in Pretrained Language Models (ACL2020)
- Thieves on Sesame Street! Model Extraction of BERT-based APIs (ICLR2020)
- Graph-Bert: Only Attention is Needed for Learning Graph Representations
- Graph-Aware Transformer: Is Attention All Graphs Need?
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages (EMNLP2020 Findings)
- Unsupervised Translation of Programming Languages
- Unified Pre-training for Program Understanding and Generation (NAACL2021)
- MathBERT: A Pre-Trained Model for Mathematical Formula Understanding
- Investigating Math Word Problems using Pretrained Multilingual Language Models
- Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning (ACL2021)
- Pre-train or Annotate? Domain Adaptation with a Constrained Budget (EMNLP2021)
- Item-based Collaborative Filtering with BERT (ACL2020 WS)
- RecoBERT: A Catalog Language Model for Text-Based Recommendations
- Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
- Extending Machine Language Models toward Human-Level Language Understanding
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data (ACL2020)
- Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level (ACL2021 Findings) [github]
- Glyce: Glyph-vectors for Chinese Character Representations
- Back to the Future -- Sequential Alignment of Text Representations
- Improving Cuneiform Language Identification with BERT (NAACL2019 WS)
- Generating Derivational Morphology with BERT
- BERT has a Moral Compass: Improvements of ethical and moral values of machines
- MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training (ACL2021 Findings)
- SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (ACM-BCB2019)
- ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction
- BERT Learns (and Teaches) Chemistry
- Prediction of RNA-protein interactions using a nucleotide language model
- Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from Transformers by Self-supervised Learning of Sketch Gestalt (CVPR2020)
- The Chess Transformer: Mastering Play using Generative Language Models
- The Go Transformer: Natural Language Modeling for Game Play
- On the comparability of Pre-trained Language Models
- Transformers: State-of-the-art Natural Language Processing
- The Cost of Training NLP Models: A Concise Overview