Keywords

Condenser; Pre-training for Retriever; Contrastive learning; Natural Question; Trivia QA; MS-MARCO;

TL;DR

Condenser를 기반으로 추가 pretraining을 진행하는 모델
문서로부터 특정 span을 추출하고 contrastive learning에 사용함
비교적 간단한 학습 전략으로 방식으로 준수한 성능을 기록함

Contribution

Unsupervised 로 모델을 학습하는 방법론
- Pretraining 과정에서 label 정보가 필요하지 않음
- Query agnostic한 방법론으로 pretraining 가능함
Gradient caching을 통해 computation cost를 줄임
- large batch size가 필요하지 않음
Dense retrieval을 위한 completely hands-off pretrain method 제안

Abstract

Recent research demonstrates the effectiveness of using fine-tuned language models~(LM) for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily engineered fine-tuning pipelines to realize their full potential. In this paper, we identify and address two underlying problems of dense retrievers: i) fragility to training data noise and ii) requiring large batches to robustly learn the embedding space. We use the recently proposed Condenser pre-training architecture, which learns to condense information into the dense vector through LM pre-training. On top of it, we propose coCondenser, which adds an unsupervised corpus-level contrastive loss to warm up the passage embedding space. Retrieval experiments on MS-MARCO, Natural Question, and Trivia QA datasets show that coCondenser removes the need for heavy data engineering such as augmentation, synthesis, or filtering, as well as the need for large batch training. It shows comparable performance to RocketQA, a state-of-the-art, heavily engineered system, using simple small batch fine-tuning.

Paper link

https://arxiv.org/abs/2108.05540

Presentation link

https://drive.google.com/file/d/1sJzZ5_TvaIiSE4fG2YaEyrmo62N2PaH1/view?usp=sharing

Reference papers for presentation

Condenser: a Pre-training Architecture for Dense Retrieval [link]
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering [link]
A Simple Framework for Contrastive Learning of Visual Representations [link]
SimCSE: Simple Contrastive Learning of Sentence Embeddings [link]
Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere [link]
How Contextual are Contextualized Word Representations? [link]

video link

https://youtu.be/G7Yxz0_QkWk

jwkanggist / SSL-narratives-NLP-1

[7주차] Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval #8

Keywords

TL;DR

TL;DR

Contribution

Abstract

Paper link

Presentation link

Reference papers for presentation

video link