PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation

https://arxiv.org/abs/2004.07159

Abstract

: This work presents PALM with a novel scheme that jointly pre-trains an autoencod- ing and autoregressive language model on a large unlabeled corpus, specifically designed for generating new text conditioned on con- text.

Read and comprehend 기반으로 생성하는 모델을 제안한다. PALM (Pre-training an Autoencoding and autoregressive Language Model for text generation based on reading comprehensionof textual context

: context를 활용한 BART 식의 LM을 제안한다는 것

BART와 MASS는 비슷하지만 noise 전략이 다름

MASS : Encoder에서 masking 한 부분을 Decoder에서는 보여주고, Encoder에서 masking 되지 않은 부분을 Decoder에서는 masking 함

Encoder에서 1개의 토큰에 masking 하면 Decoder에서 1개만 예측하면 되므로 BERT와 비슷해지고
Encoder에서 모든 토큰을 masking 하면 Decoder에서 전부 예측해야하므로 GPT2와 비슷해짐 BART : Encoder에서는 masking하고, Decoder에서는 masking 하지 않음 (span masking) -> span에서 누락된 토큰의 수를 예측하도록 함

Pointer-generator network를 사용한다

Final distribution = mixture of the extended vocabulary distribution and the copy distribution

extended vocabulary distribution -> Voc. dist copy distribution -> Attention dist

학습 시 A, B, C, D가 존재하면 Encoder에 A, B를 통해 MLM 학습하고 , Decoder에서 C D를 통해 LM을 학습한다.

BART와 다른점은 BART는 encoder에서 MLM을 학습하지 않고, Decoder에서도 ABCD 문장 전부 LM을 하는 것

PALM으로 학습하게 되면 encoder에 context 또는 condition을 줄 수있고, decoder에서는 encoder vector를 이용해 생성가능

-> language generation tasks, including generative QA, abstractive summarization, question generation, and conversational response generation.

toriving / Plz_Read_The_Paper

PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation #59