modulabs / beyondBERT

11.5기의 beyondBERT의 토론 내용을 정리하는 repository입니다.
MIT License
60 stars 6 forks source link

ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation #19

Closed seopbo closed 4 years ago

seopbo commented 4 years ago

어떤 내용의 논문인가요? 👋

ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation (https://arxiv.org/abs/1907.05339)

Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo and Xueqi Cheng CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences, Beijing, China { zhanghainan,lanyanyan, pangliang, guojiafeng, cxq } @ict.ac.cn

Abstract (요약) 🕵🏻‍♂️

이 논문을 읽어서 무엇을 배울 수 있는지 알려주세요! 🤔

내용

  1. DataSets.

    • JDC(The Chinese customer service dataset)
    • Ubuntu(2.0)
    • Dialogues
Dataset Train Dev Test
JDC 500,000 7,843 7,843
Ubuntu 3,980,000 10,000 10,000
image

[v] Thanks:) [v] awesome [v] solved

  1. Model Architecture
image
  1. Encoder

3.1. Context Representation Encoder

image

Query: turns_{N} [N, batch, embedsize] Key: turns{N} [N, batch, embedsize] Value: turns{N} [N, batch, embed_size]

-> turns_{N} [N, batch, embed_size]

image

3.2. Response Representation Encoder

image

[Train]

Query: resp_{M} [M, batch, embedsize] Key: resp{M} [M, batch, embedsize] Value: resp{M} [M, batch, embed_size]

-> repr_{M} [M, batch, embed_size]

pred: aha:) thank you [eos] ? label: [sos] aha:) thanks [eos]

[Inference]

Query: resp_{t} [1, batch, embedsize] Key: resp{t-1} [t-1, batch, embedsize] Value: resp{t-1} [t-1, batch, embed_size]

-> resp_{1} [1, batch, embed_size]

input: [sos] output: 'aha:)'

input: [sos] aha:) ouput: thanks

input: [sos] aha: thanks ouput: [eos]

break

3.3 Context-Response Attention Decoder

Train:

Query: resp_{m} [m, batch, embedsize] Key: turns{n} [n, batch, embedsize] Value: turns{n} [n, batch, embed_size]

Inference:

Query: resp_{t} [1, batch, embedsize] Key: turns{n} [n, batch, embedsize] Value: turns{n} [n, batch, embed_size]

image

3.4. Log-Likelihood

image

3.5. Predict Next Word

train: [batch, m, vocab_size]

inference: [batch, 1, vocab_size]

image
  1. Evaluation

4.1. Metrics

4.2. Experiments

4.2.1 Metric-based Evaluation

image

4.2.2 Human Evaluation

image

4.3 Analysis on Relevant Contexts

4.3.1 Quantitative Evaluation

[Ranking evaluation measures]

image

4.3.2 Case Study

  1. ETC
    • We run all the models on a Tesla K80 GPU card.
    • Augmentation / Attention
    • Retrieval / Generation

같이 읽어보면 좋을 만한 글이나 이슈가 있을까요?

LSTM (DSTC7) -> Self-Attn: Pretrained Bert (DSTC8)