modulabs / beyondBERT

11.5๊ธฐ์˜ beyondBERT์˜ ํ† ๋ก  ๋‚ด์šฉ์„ ์ •๋ฆฌํ•˜๋Š” repository์ž…๋‹ˆ๋‹ค.
MIT License
60 stars 6 forks source link

How multilingual is Multilingual BERT? #2

Closed seopbo closed 4 years ago

seopbo commented 4 years ago

Abstract (์š”์•ฝ) ๐Ÿ•ต๐Ÿปโ€โ™‚๏ธ

In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.

์–ด๋–ค ๋‚ด์šฉ์˜ ๋…ผ๋ฌธ์ธ๊ฐ€์š”? ๐Ÿ‘‹

์ด ๋…ผ๋ฌธ์˜ ๊ฒฐ๋ก ์„ ์ ์–ด์ฃผ์„ธ์š”.

  1. ์•„๋ž˜ ๋‚ด์šฉ์œผ๋กœ ํ•™์Šตํ•œ Multilingual BERT (M-BERT) ๋Š” NER๊ณผ POS task์— ๋Œ€ํ•ด cross-lingual transfer ability ๊ฐ€ ์ข‹๋‹ค.
    • language identifier ์—†์ด
    • ์œ„ํ‚คํ”ผ๋””์•„์˜ ๋ฌธ์„œ๋กœ ํ•™์Šต (140 ๊ฐœ ์–ธ์–ด)
    • w/ shared word piece vocab
  2. ๋ชจ๋“  ์–ธ์–ด์Œ์— ๋Œ€ํ•ด zero-shot transfer ๊ฐ€ ์ž˜ ๋œ ๊ฒƒ์€ ์•„๋‹ˆ์—ˆ๋Š”๋ฐ ๊ทธ๋ ‡๋‹ค๋ฉด ์™œ ์ด๋Ÿฐ ์ฐจ์ด๊ฐ€ ๋ฐœ์ƒํ• ๊นŒ?
    • finetuning ์–ธ์–ด์™€ evaluation ์–ธ์–ด์˜ vocab overlap ๋•Œ๋ฌธ์€ ์•„๋‹˜
    • ์˜คํžˆ๋ ค ์–ธ์–ด์˜ typological ํŠน์ง• ๋•Œ๋ฌธ
      • typological ํŠน์ง•๋„ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜๊ฐ€ ์žˆ๋Š”๋ฐ (์—ฌ๊ธฐ์„œ๋Š” subject/object/verb order, adjective/noun order์— ๋Œ€ํ•ด์„œ๋งŒ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์คŒ), ๊ทธ ์ค‘ SVO order ์— ๊ฐ€์žฅ ํฐ ์˜ํ–ฅ์„ ๋ฐ›์Œ
    • transfer ํ•˜๊ธฐ ์œ„ํ•œ ์–ธ์–ด์— ๋Œ€ํ•ด ํ•™์Šตํ•œ ์ ์ด ์žˆ์„ ๋•Œ transfer ๊ฐ€๋Šฅ
    • M-BERT ์˜ ์ค‘๊ฐ„ layer (8/12) ์—์„œ cross-lingual information ์ด ๋†’์Œ

์ด ๋…ผ๋ฌธ์˜ ์•„์ด๋””์–ด๋ฅผ ์ ์–ด์ฃผ์„ธ์š”. (์š”์•ฝํ•˜์—ฌ ์ ๊ฑฐ๋‚˜, ์ž์„ธํžˆ ์ ์–ด๋„ ์ƒ๊ด€์—†์Šต๋‹ˆ๋‹ค.)

Main Question: ๋ฌด์—‡์ด M-BERT์˜ zero-shot cross-lingual transferability๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š”๊ฐ€?

0. Preliminaries

1. NER task

2. POS task

3. Code-switching (CS) and Transliterate (Tlit) task

1. M-BERT ์˜ cross-lingual transferability ๋Š” vocab overlap ๋•Œ๋ฌธ์ผ๊นŒ? โžก๏ธ NO

2. M-BERT์˜ cross-lingual transferability ๋Š” ์–ธ์–ด์˜ typological ํŠน์ง• ๋•Œ๋ฌธ์ผ๊นŒ? โžก๏ธ YES

3. M-BERT์˜ cross-lingual transferability ๋Š” CS ํ˜น์€ Tlit ๊นŒ์ง€ ์ ์šฉ๋  ์ˆ˜ ์žˆ์„๊นŒ? โžก๏ธ CS (YES) / Tlit (NO)

4. M-BERT์˜ feature space

๋…ผ๋ฌธ์— ๋Œ€ํ•œ (๊ฐœ์ธ์ ์ธ) ์ƒ๊ฐ

1. vocab overlap ์‹คํ—˜์—์„œ EN-BERT์™€์˜ ๋น„๊ต๋Š” ์ •๋‹นํ•œ๊ฐ€?

2. SOV order ๊ฐ€ ์ค‘์š”ํ•œ ์ ์ด์—ˆ์„๊นŒ?

3. feature space

4. ๋…ผ๋ฌธ์˜ ๋ถ„์„ ๋‚ด์šฉ

๋…ผ๋ฌธ์— ๋Œ€ํ•œ ๊ธฐํƒ€ ์ •๋ณด

Venue

Authors

Reference

[1] this paper: (arXiv) https://arxiv.org/abs/1906.01502 / (acl) https://www.aclweb.org/anthology/P19-1493.pdf [2] slide: http://www.dhgarrette.com/papers/pires_multilingual_bert_acl2019_slides.pdf (shared by. @soeque1) [3] youtube: https://www.youtube.com/watch?v=ZGZy_GrFkAY (shared by. @soeque1) [4] Selective Sharing for Multilingual Dependency Parsing [5] BERTScore

soeque1 commented 4 years ago

์œ„ ๋…ผ๋ฌธ Multilinuual BERT(mBERT)๋Š” Language์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” Language-agnostic ์„ ์žฅ์ ์œผ๋กœ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๊ฐ€ Input์œผ๋กœ Language ์ •๋ณด๋ฅผ ์ฃผ๋Š” ๋ชจํ˜• ๋Œ€๋น„ ์–ด๋– ํ•œ ์žฅ๋‹จ์ ์ด ์žˆ๋Š” ์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค.

๋‹ค๋ฅธ ์˜ˆ) mBART(https://arxiv.org/abs/2001.08210), XLM(https://arxiv.org/abs/1901.07291) ๋“ฑ ๋‹ค๋ฅธ Multilingual ์•„ํ‚คํ…์ณ์—์„œ Language token(special token) ํ˜น์€ Language embedding(token type ids)์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

=> ์ž๋‹ต: mBERT ์žฅ์ : ์–ธ์–ด๋ฅผ ๋”ฐ๋กœ ์ถ”์ •์•ˆํ•ด๋„ ๋œ๋‹ค ๋‹จ์ : encoding์€ ๊ดœ์ฐฎ์„ ์ˆ˜ ์žˆ์œผ๋‚˜ decoding ์‹œ ํƒ€ ์–ธ์–ด ํ† ํฐ์ด ๋“ฑ์žฅํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์„ ๊ฒƒ ๊ฐ™๋‹ค.

wonhocho commented 4 years ago

Multilingual language๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ์ ์šฉํ•˜๋Š” ์‚ฌ๋ก€๋Š”.. (๋ชจ๋“  ์–ธ์–ด ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์€ ์ƒํ™ฉ์—์„œ) ๋‹ค๊ตญ์–ด๋ฅผ ์ง€์›ํ•ด์•ผ ํ•˜๋Š” ์„œ๋น„์Šค๋ผ๋Š” ์ œ ์ƒ๊ฐ์ด ๋งž์„๊นŒ์š”? ํŠน์ • ์–ธ์–ด๋งŒ ์ง€์›ํ•˜๋Š” ๊ฒฝ์šฐ์—๋„ Multilingual language ํ•™์Šต์˜ ์žฅ์ ์ด ์žˆ์„๊นŒ์š”?

DataLama commented 4 years ago

๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ์†Œ์ˆ˜์–ธ์–ด ๋‚ด์˜ NLP ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š”๋ฐ, single language model์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค Multilingual Bert๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ๋„์›€์ด ๋ ๊นŒ์š”?

bj1123 commented 4 years ago

๋…ผ๋ฌธ๊ณผ ์ง์ ‘์ ์œผ๋กœ ๊ด€๋ จ๋œ ์งˆ๋ฌธ์€ ์•„๋‹™๋‹ˆ๋‹ค

  1. input language์— ๋Œ€ํ•œ ์ •๋ณด๋„ ํ•จ๊ป˜ ๊ณ ๋ คํ•˜๋Š” multilingual language model์€ ์–ด๋–ค ๊ฒƒ์ด ์žˆ๋‚˜์š”? ์–ธ์–ด์˜ ํŠน์ง•์ด ์„ฑ๋Šฅ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ์‚ฌ์‹ค์ด ๋ณด๊ณ ๋œ ์ ์„ ๋ฏธ๋ค„ ์ง์ž‘ํ•ด๋ณด๋ฉด, input language marker์˜ ์‚ฌ์šฉ์ด ์ถฉ๋ถ„ํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์งˆ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค๊ธด ํ•ฉ๋‹ˆ๋‹ค.
  2. multilingual language model์ด unsupervised ๋ฌด์—‡ (e.g. MT)์— ์ ์šฉ๋œ ์‚ฌ๋ก€๊ฐ€ ์žˆ์„๊นŒ์š”?
warnikchow commented 4 years ago

3.2์—์„œ, Table 4์˜ ์˜ค๋ฅธ์ชฝ์—์„œ, ์ €์ž๋“ค์€ ์–ด๋–ค cross-script transfer์ด ๋‹ค๋ฅธ pair๋ณด๋‹ค less accurateํ•˜๋‹ค๋Š” ์˜ˆ์‹œ๋กœ English์™€ Bulgarian, ๊ทธ๋ฆฌ๊ณ  Japanese๋ฅผ ๋“ค๋ฉฐ, typological feature (SVO or Adj/Noun order)๋ฅผ ์ด์œ ์˜ ํ•˜๋‚˜๋กœ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์‚ฌ์‹ค ์ด ๊ณผ์ •์—์„œ ๋น„๊ต๊ฐ€ ์ œ๋Œ€๋กœ ๋˜๋ ค๋ฉด, word order์™€ script ์ค‘ ํ•˜๋‚˜๋งŒ ๋‹ค๋ฅธ ๊ฒฝ์šฐ๊ฐ€ ์™€์•ผ ๋˜์ง€ ์•Š์•˜๋‚˜ ํ•˜๋Š” ์ƒ๊ฐ์ด ๋“ญ๋‹ˆ๋‹ค (e.g., en-zh / zh-ja?).

  1. en-zh๋Š” en-ja๋ณด๋‹ค, ํ˜น์€ zh-ja๋Š” en-ja๋ณด๋‹ค transfer์ด ์ž˜ ๋ ๊นŒ์š”?
  2. ๋˜ํ•œ, ja๊ฐ€ en, bg์™€ ๋‹ค๋ฅด๊ฒŒ agglutinative์ธ ๊ฒƒ์€ ๊ณ ๋ ค๋˜์ง€ ์•Š์•˜๋Š”๋ฐ, BERT์˜ tokenization ํŠน์„ฑ ์ƒ agglutinativeํ•œ ์„ฑ์งˆ์ด ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€๋Š” ์•Š์•˜์„๊นŒ์š”?
  3. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋งŒ์•ฝ word order๊ฐ€ ํ•˜๋‚˜์˜ ์ด์œ ๋ผ๋ฉด, multilingual manner์—์„œ M-XLNet์ด M-BERT๋ณด๋‹ค ๋” ์ž˜ cross-lingual transfer์„ ๋ณด์—ฌ์ฃผ๊ฒŒ ๋ ๊นŒ์š”?
Beomi commented 4 years ago

English BERT๋กœ ํ•™์Šตํ•œ MT์—์„œ Vocab Overlap์ด ๋” ๋งŽ์€ ์˜ํ–ฅ์„ ๋ผ์น˜๊ณ , mBERT๋ณด๋‹ค ์„ฑ๋Šฅ์ƒ ํ•˜๋ฝ์ด ๋‚˜ํƒ€๋‚˜๋Š”๊ฒŒ 1) ๊ฒฐ๊ตญ BERT๊ฐ€ ๋ฐ”๋ผ๋ณธ ์ „์ฒด ๋ฐ์ดํ„ฐ๋Ÿ‰ ๋•Œ๋ฌธ์ผ์ง€ 2) ๋งŒ์•ฝ ๋ฐ์ดํ„ฐ๋Ÿ‰์„ ๋™์ผํ•˜๊ฒŒ ์ œํ•œํ•œ๋‹ค๊ณ  ํ•˜๋”๋ผ๋„ ์—ฌ์ „ํžˆ mBERT๊ฐ€ ๋” ์ข‹์€ ์„ฑ๋Šฅ์ด ๋‚˜์˜ฌ์ง€ 3) ๋งŽ์€ ์˜์–ด ๋ฐ์ดํ„ฐ EngBERT vs (์ƒ๋Œ€์ ์œผ๋กœ ์ ์€๋Ÿ‰) mBERT ์„ฑ๋Šฅ์ด ์–ด๋–ป๊ฒŒ ๋ ์ง€๋„ ๊ถ๊ธˆํ•˜๋„ค์š”.

๊ทธ๋ฆฌ๊ณ  Subword๋กœ ๊ตฌ๋ณ„์„ ํ•˜๋Š”๋ฐ.. ์˜ˆ๋ฅผ๋“ค์–ด ์˜์–ด/ํ•œ๊ตญ์–ด๊ฐ€ ์„ž์ธ ๋ฐ์ดํ„ฐ(๋‹น์žฅ ์šฐ๋ฆฌ ์ด์Šˆ๋Œ“๊ธ€๋“ค๋งŒ ํ•ด๋„..)๋กœ ํ•™์Šตํ•  ๋•Œ ์ด๊ฒŒ ๊ฒน์น˜๋Š” ๋น„์œจ์ด ๋†’๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋Š”๊ฑธ๊นŒ์š”? -> ์ด๋•Œ Positional(Word Order) Info๊ฐ€ BERT์—๊ฒŒ ์–ผ๋งˆ๋‚˜ '๋” ์ž˜ ํ•™์Šตํ•˜๊ฒŒ ํ•˜๋Š”' ๋„์›€์ด ๋ ๊นŒ์š”?

hanjiyoon01 commented 4 years ago

(ํšŒ์‚ฌ ์•„์ด๋””๋กœ ๋‹ฌ์•„์„œ ์ง€์šฐ๊ณ  ๋‹ค์‹œ ๋‹ต๋‹ˆ๋‹คใ… _ใ…  ์•Œ๋žŒ ๋‘ ๋ฒˆ ๋ฐ›์œผ์‹  ๋ถ„๋“ค ์ฃ„์†กํ•ด์š”)

1) vocab overlap์ด ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š์€ ์ด์œ ๋Š” ์–ด์ฐจํ”ผ ๊ณต๊ธฐํ•˜๋Š” token ๊ฐ„์˜ ๊ด€๊ณ„๊ฐ€ ์ž„๋ฒ ๋”ฉ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ธฐ ๋•Œ๋ฌธ์ผ๊นŒ์š”?

2) ์–ธ์–ด ์œ ํ˜•์„ ๋‚˜๋ˆ„๋Š” ๊ธฐ์ค€์ธ WALS features(https://wals.info/)๊ฐ€ ๊ต‰์žฅํžˆ ํฅ๋ฏธ๋กœ์šด๋ฐ, ๊ณผ์—ฐ ์ด ํŠน์„ฑ๋“ค์ด ์ž„๋ฒ ๋”ฉ์— ์ œ๋Œ€๋กœ ๋ฐ˜์˜์ด ๋˜์–ด ์žˆ์„๊นŒ์š”? ์ผ๋ณธ์–ด์˜ ๊ฒฝ์šฐ word piece๊ฐ€ ์–ด๋–ค ๊ฒฝํ–ฅ์„ ๋ณด์ด๋Š” ์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ํ† ํฌ๋‚˜์ด์ง„ ๋‹จ์œ„๊ฐ€ 3๊ฐœ๋‚˜ ๋˜๋Š”๋ฐ, ํ•˜๋‚˜๋Š” ํ˜•ํƒœ์†Œ์— ๊ฐ€๊นŒ์šด ์ตœ์†Œ ๋‹จ์œ„, ํ•˜๋‚˜๋Š” ๋ฌธ๋ฒ•์ ์ธ ์—ฐ์–ด๊นŒ์ง€ ํฌ๊ด„ํ•˜๋Š” ๋‹จ์œ„, ํ•˜๋‚˜๋Š” ์–ด์ ˆ(ๆ–‡็ฏ€, bunsetsu) ๋‹จ์œ„์ธ๋ฐ subword๊ฐ€ ์ € ์„ธ ๋‹จ์œ„ ์ค‘ ์–ด๋Š ๊ฒƒ์— ๊ฐ€๊น๋ƒ์— ๋”ฐ๋ผ์„œ ๋ฌธ๋ฒ•๋ฒ•์ฃผ๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด ์ƒ๋‹นํžˆ ๋‹ฌ๋ผ์งˆ ๊ฒƒ ๊ฐ™์•„์š”. ํ•œ๊ตญ์–ด์˜ ๊ฒฝ์šฐ๋„ adjective(ํ˜•์šฉ์‚ฌ)์˜ ํŠน์„ฑ์ด ์˜์–ด์™€ ์ƒ๋‹นํžˆ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์ž„๋ฒ ๋”ฉ์— ์–ด๋–ป๊ฒŒ ๋ฐ˜์˜์ด ๋˜์—ˆ์„ ์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ POS ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์ธ UD๋Š” ํ•œ๊ตญ์–ด๋Š” ์–ด์ ˆ๋‹จ์œ„, ์ผ๋ณธ์–ด๋Š” ์–ด์ ˆ ๋‹จ์œ„์˜€๋‹ค๊ฐ€ SUW๊ธฐ๋ฐ˜์œผ๋กœ ๋ณ€๊ฒฝํ•ด๋†”์„œ ์–ด๋–ค ์‹์œผ๋กœ ์ฒ˜๋ฆฌ๋˜์—ˆ๋Š” ์ง€๋„ ๊ถ๊ธˆํ•˜๋„ค์š”.

Taekyoon commented 4 years ago
  1. Figure 3: Accuracy of nearest neighbor translation for EN-DE, EN-RU, and HI-UR. ์ด ๋‚ด์šฉ์—์„œ match accuracy๊ฐ€ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด๋กœ ๊ฐ€๋ฉด์„œ ๋–จ์–ด์ง€๋Š”๋ฐ ์™œ ์ด๋Ÿด๊นŒ์š”?
oglee815 commented 4 years ago

cross lingual model transfer ์˜ ์„ฑ๋Šฅ๋น„๊ต์‹œ, ์ €์ž๋Š” typologically ๋น„์Šทํ•œ ์–ธ์–ด๋“ค์—์„œ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๊ณ  ํ–ˆ๋Š”๋ฐ, ๋ฌผ๋ก  ์ด ์ด์œ ๋„ ์žˆ๊ฒ ์ง€๋งŒ ์• ์ดˆ์— ํ”„๋ฆฌํŠธ๋ ˆ์ด๋‹ ๋‹จ๊ณ„์—์„œ ์‚ฌ์šฉ๋œ ๊ฐ ์–ธ์–ด๋“ค์˜ ๋ฐ์ดํ„ฐ ์–‘ ์ž์ฒด๊ฐ€ ์ค‘๊ตฌ๋‚œ๋ฐฉ์œผ๋กœ ๋‹ฌ๋ž๋‹ค๋Š” ๊ฒƒ๋„ ํ•œ ์š”์ธ์ด ๋  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ ํ•˜๋Š” ์ƒ๊ฐ์ด ๋“œ๋„ค์š”.

Huffon commented 4 years ago
  1. Code-switching ์ด๋ผ๋Š” ๋ฌธ์ œ๊ฐ€ ํฅ๋ฏธ๋กœ์šด๋ฐ, NMT์— ์žˆ์–ด์„œ๋„ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ฃจ์–ด์งˆ ๋ฌธ์ œ์ผ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋ฒˆ์—ญ๊ธฐ ๊ด€๋ จ ์—ฐ๊ตฌ๋ฅผ ํ•˜์‹ค ๋•Œ, Code-switching ๊ฐ™์€ ๋ฌธ์ œ๋“ค์€ ์–ด๋–ป๊ฒŒ ์ ‘๊ทผํ•˜๊ณ  ๊ณ„์‹ ๊ฐ€์š”? ์ƒ๊ฐํ•ด๋ณด๋‹ˆ ์ด ์งˆ๋ฌธ๋„ CS ์‚ฌ๋ก€๊ฐ€ ๋˜์–ด๊ฐ€๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™๋„ค์š”..

  2. ์–ธ์–ด ์œ ํ˜•ํ•™์ ์œผ๋กœ ๋น„์Šทํ•œ ์–ธ์–ด ๊ฐ„ ํŠธ๋žœ์Šคํผ๊ฐ€ ๋” ์ž˜๋œ๋‹ค๋Š” ์‚ฌ์‹ค์— ์žˆ์–ด ๊ถ๊ธˆํ•œ ์ ์ด ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค. MLM์„ ํ†ตํ•œ ์‚ฌ์ „ํ•™์Šต์ด Representation ํ•™์Šต์—๋Š” ํฐ ๋„์›€์ด ๋˜์ง€๋งŒ ๋ฌธ์žฅ ๊ตฌ์กฐ์™€ ๊ฐ™์€ Syntacticํ•œ ์ง€์‹์€ ๋งŽ์ด ํ•™์Šตํ•˜์ง€ ๋ชปํ•˜๋Š”๊ฒŒ ์•„๋‹๊นŒ๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ๋Š”๋ฐ, ๋‹ค๋ฅธ ๋ถ„๋“ค์˜ ์˜๊ฒฌ๋„ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค !

oglee815 commented 4 years ago

์œ„ ๋…ผ๋ฌธ Multilinuual BERT(mBERT)๋Š” Language์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” Language-agnostic ์„ ์žฅ์ ์œผ๋กœ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๊ฐ€ Input์œผ๋กœ Language ์ •๋ณด๋ฅผ ์ฃผ๋Š” ๋ชจํ˜• ๋Œ€๋น„ ์–ด๋– ํ•œ ์žฅ๋‹จ์ ์ด ์žˆ๋Š” ์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค.

๋‹ค๋ฅธ ์˜ˆ) mBART(https://arxiv.org/abs/2001.08210), XLM(https://arxiv.org/abs/1901.07291) ๋“ฑ ๋‹ค๋ฅธ Multilingual ์•„ํ‚คํ…์ณ์—์„œ Language token(special token) ํ˜น์€ Language embedding(token type ids)์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

=> ์ž๋‹ต: mBERT ์žฅ์ : ์–ธ์–ด๋ฅผ ๋”ฐ๋กœ ์ถ”์ •์•ˆํ•ด๋„ ๋œ๋‹ค ๋‹จ์ : encoding์€ ๊ดœ์ฐฎ์„ ์ˆ˜ ์žˆ์œผ๋‚˜ decoding ์‹œ ํƒ€ ์–ธ์–ด ํ† ํฐ์ด ๋“ฑ์žฅํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์„ ๊ฒƒ ๊ฐ™๋‹ค.

๊ฐ™์€ ์ €์ž์˜ ์ตœ๊ทผ ๋…ผ๋ฌธ์—์„œ๋Š” cs์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด์„œ language embedding์„ ์‚ญ์ œํ–ˆ๋‹ค๊ณ ํ•˜๋„ค์š” https://arxiv.org/abs/1911.02116

kh-kim commented 4 years ago
  • Code-switching ์ด๋ผ๋Š” ๋ฌธ์ œ๊ฐ€ ํฅ๋ฏธ๋กœ์šด๋ฐ, NMT์— ์žˆ์–ด์„œ๋„ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ฃจ์–ด์งˆ ๋ฌธ์ œ์ผ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋ฒˆ์—ญ๊ธฐ ๊ด€๋ จ ์—ฐ๊ตฌ๋ฅผ ํ•˜์‹ค ๋•Œ, Code-switching ๊ฐ™์€ ๋ฌธ์ œ๋“ค์€ ์–ด๋–ป๊ฒŒ ์ ‘๊ทผํ•˜๊ณ  ๊ณ„์‹ ๊ฐ€์š”? ์ƒ๊ฐํ•ด๋ณด๋‹ˆ ์ด ์งˆ๋ฌธ๋„ CS ์‚ฌ๋ก€๊ฐ€ ๋˜์–ด๊ฐ€๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™๋„ค์š”..

๋””์ฝ”๋”๋ฅผ ๋†“๊ณ  ์ƒ๊ฐํ•ด๋ณด๋ฉด, ๊ฒฐ๊ตญ ์–ธ์–ด๋ชจ๋ธ์ด code-switched corpus๋ฅผ ๋ด์•ผ maximum likelihood ๊ด€์ ์—์„œ code-switching ๋œ ๊ฒฐ๊ณผ๋ฌผ์„ ๋‚ด๋†“์„ ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ์š”? ๋˜๋Š” ์ถ”๊ฐ€์ ์ธ contraint์„ ์ฃผ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์ง€ ์•Š์„๊นŒ์š”..

seopbo commented 4 years ago
  1. Code-switching ์ด๋ผ๋Š” ๋ฌธ์ œ๊ฐ€ ํฅ๋ฏธ๋กœ์šด๋ฐ, NMT์— ์žˆ์–ด์„œ๋„ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ฃจ์–ด์งˆ ๋ฌธ์ œ์ผ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋ฒˆ์—ญ๊ธฐ ๊ด€๋ จ ์—ฐ๊ตฌ๋ฅผ ํ•˜์‹ค ๋•Œ, Code-switching ๊ฐ™์€ ๋ฌธ์ œ๋“ค์€ ์–ด๋–ป๊ฒŒ ์ ‘๊ทผํ•˜๊ณ  ๊ณ„์‹ ๊ฐ€์š”? ์ƒ๊ฐํ•ด๋ณด๋‹ˆ ์ด ์งˆ๋ฌธ๋„ CS ์‚ฌ๋ก€๊ฐ€ ๋˜์–ด๊ฐ€๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™๋„ค์š”..
  2. ์–ธ์–ด ์œ ํ˜•ํ•™์ ์œผ๋กœ ๋น„์Šทํ•œ ์–ธ์–ด ๊ฐ„ ํŠธ๋žœ์Šคํผ๊ฐ€ ๋” ์ž˜๋œ๋‹ค๋Š” ์‚ฌ์‹ค์— ์žˆ์–ด ๊ถ๊ธˆํ•œ ์ ์ด ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค. MLM์„ ํ†ตํ•œ ์‚ฌ์ „ํ•™์Šต์ด Representation ํ•™์Šต์—๋Š” ํฐ ๋„์›€์ด ๋˜์ง€๋งŒ ๋ฌธ์žฅ ๊ตฌ์กฐ์™€ ๊ฐ™์€ Syntacticํ•œ ์ง€์‹์€ ๋งŽ์ด ํ•™์Šตํ•˜์ง€ ๋ชปํ•˜๋Š”๊ฒŒ ์•„๋‹๊นŒ๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ๋Š”๋ฐ, ๋‹ค๋ฅธ ๋ถ„๋“ค์˜ ์˜๊ฒฌ๋„ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค !

์‹œ๊ฐ„์ด ์ง€์ฒด๋˜์„œ ์ œ๊ฐ€ ๋‚˜์ค‘์— ์ •๋ฆฌํ•ด๋†“์„๊ฒŒ์š”,

seopbo commented 4 years ago

3.2์—์„œ, Table 4์˜ ์˜ค๋ฅธ์ชฝ์—์„œ, ์ €์ž๋“ค์€ ์–ด๋–ค cross-script transfer์ด ๋‹ค๋ฅธ pair๋ณด๋‹ค less accurateํ•˜๋‹ค๋Š” ์˜ˆ์‹œ๋กœ English์™€ Bulgarian, ๊ทธ๋ฆฌ๊ณ  Japanese๋ฅผ ๋“ค๋ฉฐ, typological feature (SVO or Adj/Noun order)๋ฅผ ์ด์œ ์˜ ํ•˜๋‚˜๋กœ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์‚ฌ์‹ค ์ด ๊ณผ์ •์—์„œ ๋น„๊ต๊ฐ€ ์ œ๋Œ€๋กœ ๋˜๋ ค๋ฉด, word order์™€ script ์ค‘ ํ•˜๋‚˜๋งŒ ๋‹ค๋ฅธ ๊ฒฝ์šฐ๊ฐ€ ์™€์•ผ ๋˜์ง€ ์•Š์•˜๋‚˜ ํ•˜๋Š” ์ƒ๊ฐ์ด ๋“ญ๋‹ˆ๋‹ค (e.g., en-zh / zh-ja?).

  1. en-zh๋Š” en-ja๋ณด๋‹ค, ํ˜น์€ zh-ja๋Š” en-ja๋ณด๋‹ค transfer์ด ์ž˜ ๋ ๊นŒ์š”?
  2. ๋˜ํ•œ, ja๊ฐ€ en, bg์™€ ๋‹ค๋ฅด๊ฒŒ agglutinative์ธ ๊ฒƒ์€ ๊ณ ๋ ค๋˜์ง€ ์•Š์•˜๋Š”๋ฐ, BERT์˜ tokenization ํŠน์„ฑ ์ƒ agglutinativeํ•œ ์„ฑ์งˆ์ด ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€๋Š” ์•Š์•˜์„๊นŒ์š”?
  3. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋งŒ์•ฝ word order๊ฐ€ ํ•˜๋‚˜์˜ ์ด์œ ๋ผ๋ฉด, multilingual manner์—์„œ M-XLNet์ด M-BERT๋ณด๋‹ค ๋” ์ž˜ cross-lingual transfer์„ ๋ณด์—ฌ์ฃผ๊ฒŒ ๋ ๊นŒ์š”?

์‹œ๊ฐ„์ด ์ •์ฒด๋˜์–ด ๋”ฐ๋กœ ์ •๋ฆฌํ•˜๊ฒŸ์Šต๋‹ˆ๋‹ค.

inmoonlight commented 4 years ago

@wonhocho

Multilingual language๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ์ ์šฉํ•˜๋Š” ์‚ฌ๋ก€๋Š”.. (๋ชจ๋“  ์–ธ์–ด ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์€ ์ƒํ™ฉ์—์„œ) ๋‹ค๊ตญ์–ด๋ฅผ ์ง€์›ํ•ด์•ผ ํ•˜๋Š” ์„œ๋น„์Šค๋ผ๋Š” ์ œ ์ƒ๊ฐ์ด ๋งž์„๊นŒ์š”? ํŠน์ • ์–ธ์–ด๋งŒ ์ง€์›ํ•˜๋Š” ๊ฒฝ์šฐ์—๋„ Multilingual language ํ•™์Šต์˜ ์žฅ์ ์ด ์žˆ์„๊นŒ์š”?

์ œ ๊ฐœ์ธ์ ์ธ ์˜๊ฒฌ์€ ํŠน์ • ์–ธ์–ด์— ๋Œ€ํ•ด์„œ๋Š” ๊ทธ ์–ธ์–ด๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด ์ข‹๊ธฐ ๋•Œ๋ฌธ์—, ํŠน์ • ์–ธ์–ด๋งŒ ์ง€์›ํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ๋Š” M-BERT์˜ ์žฅ์ ์ด ๋–จ์–ด์ง€๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค!

inmoonlight commented 4 years ago

@DataLama

๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ์†Œ์ˆ˜์–ธ์–ด ๋‚ด์˜ NLP ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š”๋ฐ, single language model์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค Multilingual Bert๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ๋„์›€์ด ๋ ๊นŒ์š”?

์œ„์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๊ณผ ์—ฐ๊ฒฐ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์•„์š”! ๋„ค low-resource language ์˜ ๊ฒฝ์šฐ M-BERT ๊ฐ€ ๋” ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค ใ…Žใ…Ž

inmoonlight commented 4 years ago

@bj1123

๋…ผ๋ฌธ๊ณผ ์ง์ ‘์ ์œผ๋กœ ๊ด€๋ จ๋œ ์งˆ๋ฌธ์€ ์•„๋‹™๋‹ˆ๋‹ค input language์— ๋Œ€ํ•œ ์ •๋ณด๋„ ํ•จ๊ป˜ ๊ณ ๋ คํ•˜๋Š” multilingual language model์€ ์–ด๋–ค ๊ฒƒ์ด ์žˆ๋‚˜์š”? ์–ธ์–ด์˜ ํŠน์ง•์ด ์„ฑ๋Šฅ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ์‚ฌ์‹ค์ด ๋ณด๊ณ ๋œ ์ ์„ ๋ฏธ๋ค„ ์ง์ž‘ํ•ด๋ณด๋ฉด, input language marker์˜ ์‚ฌ์šฉ์ด ์ถฉ๋ถ„ํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์งˆ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค๊ธด ํ•ฉ๋‹ˆ๋‹ค. multilingual language model์ด unsupervised ๋ฌด์—‡ (e.g. MT)์— ์ ์šฉ๋œ ์‚ฌ๋ก€๊ฐ€ ์žˆ์„๊นŒ์š”?

input lang ์— ๋Œ€ํ•œ ์ •๋ณด = token id ์ •๋„๋กœ ๋ณธ๋‹ค๋ฉด, Googleโ€™s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation ์ด ๋…ผ๋ฌธ์— ๋”ฐ๋ฅด๋ฉด decoding ์‹œ์— token id ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค!

inmoonlight commented 4 years ago

@Taekyoon

Figure 3: Accuracy of nearest neighbor translation for EN-DE, EN-RU, and HI-UR. ์ด ๋‚ด์šฉ์—์„œ match accuracy๊ฐ€ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด๋กœ ๊ฐ€๋ฉด์„œ ๋–จ์–ด์ง€๋Š”๋ฐ ์™œ ์ด๋Ÿด๊นŒ์š”?

์ด ๋…ผ๋ฌธ์— ๋”ฐ๋ฅด๋ฉด, ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์—์„œ๋Š” ์–ธ์–ด specific ํ•œ token ์„ generate ํ•ด์•ผํ•˜๋ฏ€๋กœ ์–ธ์–ด์˜ ํŠน์ง•์ด ์ข€ ๋” ๋ฐ˜์˜๋œ representation ์ด ๋‚˜ํƒ€๋‚˜๊ธฐ ๋•Œ๋ฌธ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. Voita ์˜ The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives ๋…ผ๋ฌธ์—์„œ๋„ ๋น„์Šทํ•œ ๊ฒฐ๋ก ์ด ๋„์ถœ๋œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

inmoonlight commented 4 years ago

@oglee815

cross lingual model transfer ์˜ ์„ฑ๋Šฅ๋น„๊ต์‹œ, ์ €์ž๋Š” typologically ๋น„์Šทํ•œ ์–ธ์–ด๋“ค์—์„œ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๊ณ  ํ–ˆ๋Š”๋ฐ, ๋ฌผ๋ก  ์ด ์ด์œ ๋„ ์žˆ๊ฒ ์ง€๋งŒ ์• ์ดˆ์— ํ”„๋ฆฌํŠธ๋ ˆ์ด๋‹ ๋‹จ๊ณ„์—์„œ ์‚ฌ์šฉ๋œ ๊ฐ ์–ธ์–ด๋“ค์˜ ๋ฐ์ดํ„ฐ ์–‘ ์ž์ฒด๊ฐ€ ์ค‘๊ตฌ๋‚œ๋ฐฉ์œผ๋กœ ๋‹ฌ๋ž๋‹ค๋Š” ๊ฒƒ๋„ ํ•œ ์š”์ธ์ด ๋  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ ํ•˜๋Š” ์ƒ๊ฐ์ด ๋“œ๋„ค์š”.

๋™์˜ํ•ฉ๋‹ˆ๋‹ค! @Beomi ๋‹˜๋„ ๋น„์Šทํ•œ ์˜๋ฌธ์„ ๊ฐ€์ง„ ๊ฒƒ ๊ฐ™์•„์š” ใ…Žใ…Ž