ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation

어떤 내용의 논문인가요? 👋

ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation (https://arxiv.org/abs/1907.05339)

Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo and Xueqi Cheng CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences, Beijing, China { zhanghainan,lanyanyan, pangliang, guojiafeng, cxq } @ict.ac.cn

Multi-turn 대화에 대한 response 생성
contexts(history) 중에 관련있는 문장들을 더 가중하여 response를 생성하고 싶음

Abstract (요약) 🕵🏻‍♂️

ReCoSA is able to detect relevant contexts and produce a suitable response accordingly. Firstly, a word level LSTM encoder is conducted to obtain the initial representation of each context. Then, the self-attention mechanism is utilized to update both the context and masked response representation. Finally, the attention weights between each context and response representations are computed and used in the further decoding process.

이 논문을 읽어서 무엇을 배울 수 있는지 알려주세요! 🤔

Data: Ubuntu Datasets
Model:
- contexts encoding : concat([sentence(s) embedding, positional embedding])
- response encoding : concat([sentence embedding, positional embedding])
- response와 contexts의 결합 (MultiHead Attn.)
- response 학습 (Masking)
- response 생성 (Decoding)
Evaluation:
- PPL, BLEU, distinct

내용

DataSets.
- ~~JDC(The Chinese customer service dataset)~~
- Ubuntu(2.0)
- Dialogues

Dataset	Train	Dev	Test
JDC	500,000	7,843	7,843
Ubuntu	3,980,000	10,000	10,000

Example (Ubuntu, DSTC8 Figure 1, https://arxiv.org/pdf/1911.06394.pdf)

[v] Thanks:) [v] awesome [v] solved

Model Architecture

Encoder

3.1. Context Representation Encoder

Sentence (word-level) Encoder: concat([h^{c1}(LSTM), PE(Postional Encoding)]), maxlen_words: 50
- notation: M(seq), N(turn=1)

Context Self-Attention (6 Heads)

Query: turns_{N} [N, batch, embedsize] Key: turns{N} [N, batch, embedsize] Value: turns{N} [N, batch, embed_size]

-> turns_{N} [N, batch, embed_size]

3.2. Response Representation Encoder

notation: M (seq)

[Train]

Query: resp_{M} [M, batch, embedsize] Key: resp{M} [M, batch, embedsize] Value: resp{M} [M, batch, embed_size]

attn masking( < t)

-> repr_{M} [M, batch, embed_size]

pred: aha:) thank you [eos] ? label: ~~[sos]~~ aha:) thanks [eos]

[Inference]

Query: resp_{t} [1, batch, embedsize] Key: resp{t-1} [t-1, batch, embedsize] Value: resp{t-1} [t-1, batch, embed_size]

-> resp_{1} [1, batch, embed_size]

input: [sos] output: 'aha:)'

input: [sos] aha:) ouput: thanks

input: [sos] aha: thanks ouput: [eos]

break

3.3 Context-Response Attention Decoder

Train:

Query: resp_{m} [m, batch, embedsize] Key: turns{n} [n, batch, embedsize] Value: turns{n} [n, batch, embed_size]

Inference:

Query: resp_{t} [1, batch, embedsize] Key: turns{n} [n, batch, embedsize] Value: turns{n} [n, batch, embed_size]

3.4. Log-Likelihood

3.5. Predict Next Word

notation: M (seq)

train: [batch, m, vocab_size]

inference: [batch, 1, vocab_size]

Evaluation

4.1. Metrics

(generation) PPL, BLEU, distinct (diversity: the number of distinct unigrams and bigrams ingenerated responses)
(ranking) R@K, P@K, F1@K

4.2. Experiments

4.2.1 Metric-based Evaluation

4.2.2 Human Evaluation

preference gains

4.3 Analysis on Relevant Contexts

Three annotators are employed (all CS PhD students): If a contextual sentence is related with the response, then it is labeled as 1. Otherwise it is labeled as 0.

4.3.1 Quantitative Evaluation

The relevance score based on the attention weight for each context.
R_{n}@{k}: n 후보 중에서 k 순위에 들어가면 correct, otherwise incorrect. => 평균 계산
"n 후보"는 n개의 contexts ?

[Ranking evaluation measures]

4.3.2 Case Study

contexts와 response 간 heat-map

ETC
- We run all the models on a Tesla K80 GPU card.
- Augmentation / Attention
- Retrieval / Generation

같이 읽어보면 좋을 만한 글이나 이슈가 있을까요?

ReCoSa Review by Yeongmin Baek (https://baekyeongmin.github.io/paper-review/ReCoSa-review/)
ReCoSa (official, author) code (TF) (https://github.com/zhanghainan/ReCoSa/blob/master/train.py)
Ubuntu Dialogue Corpus 1.0 (https://arxiv.org/abs/1506.08909)
Ubuntu 2.0(https://github.com/rkadlec/ubuntu-ranking-dataset-creator)
Retrieval vs Generation (Recipes for building an open-domain chatbot, https://arxiv.org/abs/2004.13637)

LSTM (DSTC7) -> Self-Attn: Pretrained Bert (DSTC8)

DSTC7 Reveiw: DSTC7 Task 1: Noetic End-to-End Response Selection (http://workshop.colips.org/dstc7/papers/dstc7_task1_final_report.pdf)
DSTC8 Review: NOESIS II: Predicting Responses Track (https://arxiv.org/pdf/1911.06394.pdf) (https://www.slideshare.net/seokhwankim7/the-eighth-dialog-system-technology-challenge-dstc8)

modulabs / beyondBERT