huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.77k stars 27.18k forks source link

Add XLM-V #21330

Closed mrm8488 closed 1 year ago

mrm8488 commented 1 year ago

Model description

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

Large multilingual language models typically rely on a single vocabulary shared across 100+ languages. As these models have increased in parameter count and depth, vocabulary size has remained largely unchanged. This vocabulary bottleneck limits the representational capabilities of multilingual models like XLM-R. In this paper, we introduce a new approach for scaling to very large multilingual vocabularies by de-emphasizing token sharing between languages with little lexical overlap and assigning vocabulary capacity to achieve sufficient coverage for each individual language. Tokenizations using our vocabulary are typically more semantically meaningful and shorter compared to XLM-R. Leveraging this improved vocabulary, we train XLM-V, a multilingual language model with a one million token vocabulary. XLM-V outperforms XLM-R on every task we tested on ranging from natural language inference (XNLI), question answering (MLQA, XQuAD, TyDiQA), and named entity recognition (WikiAnn) to low-resource tasks (Americas NLI, MasakhaNER).

Should work as XLM-RoBERTa

Open source status

Provide useful links for the implementation

No response

jalajk24 commented 1 year ago

Can I work on this issue? And can you point me to where should I learn more about this?

stefan-it commented 1 year ago

Some more info:

Weights can be - according to this tweet this found here:

https://dl.fbaipublicfiles.com/fairseq/xlmv/xlmv.base.tar.gz

stefan-it commented 1 year ago

Hi guys,

I adopted the RoBERTa conversion script and model conversion was sucessful:

https://gist.github.com/stefan-it/def0e13c872e992aa54dff2768ec5da4

It outputs:

torch.Size([1, 11, 901629]) torch.Size([1, 11, 901629])
max_absolute_diff = 7.62939453125e-06
Do both models output the same tensors? 🔥
Saving model to /media/stefan/89914e9b-0644-4f79-8e65-a8c5245df168/xlmv/exported-working
Configuration saved in /media/stefan/89914e9b-0644-4f79-8e65-a8c5245df168/xlmv/exported-working/config.json
Model weights saved in /media/stefan/89914e9b-0644-4f79-8e65-a8c5245df168/xlmv/exported-working/pytorch_model.bin
stefan-it commented 1 year ago

@jalajk24 , sorry, I've overlooked your comment.

Here's an explanation what I did so far:

The next steps would be on the tokenizer part:

mrm8488 commented 1 year ago

Cool @stefan-it! So, maye we can create a model card and push the model (and tokenizer) to the hub (under the META AI org). WDYT?

stefan-it commented 1 year ago

@mrm8488 Sounds good! I will perform some tokenizer experiments and then I can upload the model -> maybe @patrickvonplaten can invite me to the Meta AI organization on the model hub (for a short time period), when the model is ready to be... tested on downstream tasks :hugs:

patrickvonplaten commented 1 year ago

Hey @stefan-it,

For sure! Invited you :-)

stefan-it commented 1 year ago

Thanks @patrickvonplaten !

I wrote a script that compares XLM-V tokenizer and HF tokenizer (which is basically a XLMRobertaTokenizer using the provided sentencepiece.bpe.model model):

https://gist.github.com/stefan-it/14295d37880bfb6329fe1db9d3e6a14c

It uses the WikiANN NER dataset that contains 176 languages, tokenizes each training sentence and compares the output of the original XLM-V tokenizer and the HF one. Some differences can be seen in the GIST mentioned above, e.g.:

Mismatch for ar sentence:
أبى أيوب الأنصارى .‌
XLM-V ids: [0, 6, 482745, 6, 529250, 478338, 382485, 6, 5, 2]
HF    ids: [0, 6, 482745, 6, 529250, 478338, 382485, 6, 5, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for az sentence:
O , nəinki Çexiyada , eləcə də bütün dün­­­yada antifaşist ədəbiyyatının ən görkəmli nümayəndələrindən bi­­ri­dir .
XLM-V ids: [0, 122, 6, 4, 78808, 2376, 4377, 25427, 6, 4, 17739, 523, 1174, 14374, 214304, 162, 4193, 3386, 1358, 1105, 1221, 89755, 345, 1825, 63822, 19671, 8914, 280, 214304, 499, 162, 381, 6, 5, 2]
HF    ids: [0, 122, 6, 4, 78808, 2376, 4377, 25427, 6, 4, 17739, 523, 1174, 14374, 162, 214304, 4193, 3386, 1358, 1105, 1221, 89755, 345, 1825, 63822, 19671, 8914, 280, 214304, 499, 162, 381, 6, 5, 2]
------------------------------------------------------------------------------------------
Mismatch for az sentence:
Filmin bəstəkarı Roberto Rossellininin qardaşı Renzo Rossellinidir .
XLM-V ids: [0, 70066, 93154, 309, 77404, 862785, 1639, 43, 49187, 872558, 862785, 43, 14803, 6, 5, 2]
HF    ids: [0, 70066, 93154, 309, 77404, 862785, 43, 1639, 49187, 872558, 862785, 43, 14803, 6, 5, 2]
------------------------------------------------------------------------------------------
Mismatch for be sentence:
некаторыя аленяводы з верхняй Калымы ўжо качавалі на чукоцкіх землях .
XLM-V ids: [0, 212747, 187222, 187276, 231515, 186902, 245172, 186910, 191873, 187211, 186906, 190574, 202645, 197768, 186882, 190562, 187180, 217232, 212793, 6, 5, 2]
HF    ids: [0, 212747, 187222, 187276, 231515, 186902, 245172, 186910, 191873, 187211, 186906, 190574, 217400, 192302, 186882, 190562, 187180, 217232, 212793, 6, 5, 2]
------------------------------------------------------------------------------------------
Mismatch for bn sentence:
আব্রাআম দ্য মোয়াভ্র্‌
XLM-V ids: [0, 450078, 447452, 391401, 383767, 442939, 388008, 392002, 500283, 388127, 2]
HF    ids: [0, 450078, 447452, 391401, 383767, 442939, 388008, 392002, 500283, 388127, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for ckb sentence:
شەڕی ناوخۆییی لیبیا ( ٢٠١١ )
XLM-V ids: [0, 448384, 3, 382407, 424947, 383163, 395213, 390588, 382407, 481417, 18, 430460, 396007, 1057, 2]
HF    ids: [0, 448384, 3, 382407, 424947, 383163, 395213, 382407, 390588, 481417, 18, 430460, 396007, 1057, 2]
------------------------------------------------------------------------------------------
Mismatch for el sentence:
το λιμάνι του Μαρσασλόκκκ ήταν Φοινικική αποικία .
XLM-V ids: [0, 51, 33074, 54, 20175, 4103, 2207, 21516, 180155, 2263, 702, 1764, 179092, 1457, 127312, 1100, 6, 5, 2]
HF    ids: [0, 51, 33074, 54, 20175, 4103, 2207, 21516, 2263, 180155, 702, 1764, 179092, 1457, 127312, 1100, 6, 5, 2]
------------------------------------------------------------------------------------------
Mismatch for eu sentence:
Þjóðólfur úr Hvini‎
XLM-V ids: [0, 576603, 584875, 704, 7755, 272, 110340, 2]
HF    ids: [0, 576603, 584875, 704, 7755, 272, 110340, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for fi sentence:
ohjaus British Wind Energy Association‎
XLM-V ids: [0, 18196, 82236, 60938, 48570, 71969, 2]
HF    ids: [0, 18196, 82236, 60938, 48570, 71969, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for fr sentence:
***************************** '' Charles de Bourbon-Siciles ''
XLM-V ids: [0, 541, 519880, 736484, 519880, 3426, 17736, 59, 648141, 13, 238, 676633, 11, 3426, 2]
HF    ids: [0, 541, 736484, 519880, 519880, 3426, 17736, 59, 648141, 13, 238, 676633, 11, 3426, 2]
------------------------------------------------------------------------------------------
Mismatch for hr sentence:
*KKK Varteks ( Varaždin )
XLM-V ids: [0, 541, 13108, 379, 2056, 11962, 18, 794202, 1057, 2]
HF    ids: [0, 541, 379, 13108, 2056, 11962, 18, 794202, 1057, 2]
------------------------------------------------------------------------------------------
Mismatch for ja sentence:
漳 州 訛 り 、 ' ' ' 泉 ' ' ' は 泉 州 訛 り を 表 す ) ] ] ‎
XLM-V ids: [0, 6, 381875, 6, 284214, 6, 371882, 6, 283722, 6, 283381, 536, 536, 536, 6, 287298, 536, 536, 536, 6, 283385, 6, 287298, 6, 284214, 6, 371882, 6, 283722, 6, 283391, 6, 284061, 6, 284248, 1057, 6305, 6305, 2]
HF    ids: [0, 6, 381875, 6, 284214, 6, 371882, 6, 283722, 6, 283381, 536, 536, 536, 6, 287298, 536, 536, 536, 6, 283385, 6, 287298, 6, 284214, 6, 371882, 6, 283722, 6, 283391, 6, 284061, 6, 284248, 1057, 6305, 6305, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for km sentence:
' '' ក្រម​ង៉ុយ '' '​គឺជា​កវី​ម្នាក់​ដែល​មិន​សរសេរ​នូវ​កំណាព្យកាព្យឃ្លោង​ដែល​លោក​ច្រៀង​នោះ​ ឡើយ ។ ស្នាដៃ​របស់​លោក​ដែល​គង់វង្ស​មកដល់​សព្វថ្ងៃនេះ​កើតមានឡើង​ដោយ​ការអញ្ជើញ​ ភ្នំពេញ ហើយ​ធ្វើ​ការកត់ត្រា​ទុក ។​
XLM-V ids: [0, 536, 3426, 6, 436488, 414054, 470537, 406071, 3426, 536, 417648, 388584, 417615, 398401, 383964, 386188, 484094, 413545, 430365, 392709, 443000, 401931, 443000, 513438, 424986, 383964, 383825, 6, 470313, 392431, 445340, 383824, 6, 527700, 384224, 383825, 383964, 6, 486458, 486640, 6, 454853, 6, 504066, 459752, 423127, 386428, 410408, 385471, 383363, 510944, 394566, 386849, 388469, 383363, 384712, 398013, 438262, 423820, 383824, 2]
HF    ids: [0, 536, 3426, 6, 436488, 414054, 470537, 406071, 3426, 536, 417648, 388584, 417615, 398401, 383964, 386188, 484094, 413545, 430365, 392709, 443000, 401931, 443000, 513438, 424986, 383964, 383825, 6, 470313, 392431, 445340, 383824, 6, 527700, 384224, 383825, 383964, 6, 486458, 486640, 6, 454853, 6, 504066, 459752, 423127, 386428, 410408, 385471, 383363, 510944, 394566, 386849, 388469, 383363, 384712, 398013, 438262, 423820, 383824, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for ko sentence:
북쪽으로는 사바 구 , 서쪽으로는 소피아 구 , 남서쪽으로는 알라오트라망고로 구 , 남쪽으로는 아치나나나 구와 접한다 .
XLM-V ids: [0, 460610, 402460, 383267, 384648, 384084, 6, 4, 464357, 402460, 383973, 408125, 384084, 6, 4, 384737, 497040, 402460, 384068, 382873, 383469, 420080, 387243, 382503, 382498, 384084, 6, 4, 445962, 402460, 383309, 383375, 459065, 382738, 384084, 382541, 390528, 383229, 6, 5, 2]
HF    ids: [0, 460610, 402460, 383267, 384648, 384084, 6, 4, 464357, 402460, 383973, 408125, 384084, 6, 4, 384737, 497040, 402460, 384068, 382873, 383469, 420080, 387243, 382503, 382498, 384084, 6, 4, 445962, 402460, 383309, 383375, 382738, 459065, 384084, 382541, 390528, 383229, 6, 5, 2]
------------------------------------------------------------------------------------------
Mismatch for lv sentence:
Eiropas autoceļš E77‎
XLM-V ids: [0, 3477, 121549, 619, 181, 6697, 2]
HF    ids: [0, 3477, 121549, 619, 181, 6697, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for mk sentence:
Поретко , на пример во делови од Пиринска Македонија и Егејска Македонија некои од горните женски облеки – ‘’’саите’’’ се кроеле од домашно ткаено платно во сина боја .
XLM-V ids: [0, 186970, 192733, 187180, 6, 4, 186882, 188182, 186930, 201221, 186939, 221926, 187217, 187685, 186883, 248608, 211453, 187685, 193651, 186939, 240530, 198728, 186987, 187184, 186991, 39, 14464, 42, 187373, 186961, 11099, 42, 186894, 203637, 197766, 186939, 210461, 6, 189541, 188031, 212555, 186930, 194795, 199817, 6, 5, 2]
HF    ids: [0, 186970, 192733, 187180, 6, 4, 186882, 188182, 186930, 201221, 186939, 221926, 187217, 187685, 186883, 248608, 211453, 187685, 193651, 186939, 240530, 198728, 186987, 187184, 186991, 39, 14464, 42, 187373, 186961, 42, 11099, 186894, 203637, 197766, 186939, 210461, 6, 189541, 188031, 212555, 186930, 194795, 199817, 6, 5, 2]
------------------------------------------------------------------------------------------
Mismatch for ml sentence:
അനു എലിസബത്ത് ജോസ്‌
XLM-V ids: [0, 397569, 385011, 528343, 388795, 385776, 481383, 2]
HF    ids: [0, 397569, 385011, 528343, 388795, 385776, 481383, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for ms sentence:
███ Sidang Kemuncak Asia Timur
XLM-V ids: [0, 6, 369908, 377468, 593458, 3944, 664695, 8451, 551742, 2]
HF    ids: [0, 6, 377468, 369908, 593458, 3944, 664695, 8451, 551742, 2]
------------------------------------------------------------------------------------------
Mismatch for no sentence:
De siste tre semestre var han i Grenoble i Frankrike , der mye av fritiden ble tilbrakt i Les2alpes og LaGrave .
XLM-V ids: [0, 447, 550187, 17752, 611647, 246, 25684, 28, 657552, 28, 557692, 6, 4, 2860, 549299, 15446, 617530, 117029, 664714, 28, 17112, 430, 460, 10083, 6995, 1079, 29815, 383, 6, 5, 2]
HF    ids: [0, 447, 550187, 17752, 611647, 246, 25684, 28, 657552, 28, 557692, 6, 4, 2860, 549299, 15446, 617530, 117029, 664714, 28, 17112, 430, 460, 10083, 6995, 1079, 597, 573563, 6, 5, 2]
------------------------------------------------------------------------------------------
Mismatch for or sentence:
ଲେଉଟାଣି ଜୋହାନ୍ ଅଗଷ୍ଟସ ଆର୍ଫୱେଡ଼ସନ୍‌
XLM-V ids: [0, 6, 387665, 391689, 393963, 403921, 393333, 392380, 395060, 388377, 522433, 387310, 6, 476299, 398439, 432754, 392919, 424507, 2]
HF    ids: [0, 6, 387665, 391689, 393963, 403921, 393333, 392380, 395060, 388377, 522433, 387310, 6, 476299, 398439, 432754, 392919, 424507, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for sh sentence:
Kefej ( kralj Tegeje ) ‎
XLM-V ids: [0, 3944, 12705, 18, 793761, 96767, 382, 1057, 2]
HF    ids: [0, 3944, 12705, 18, 793761, 96767, 382, 1057, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for sl sentence:
__________10__________ Eugenio Siena Alfa Romeo
XLM-V ids: [0, 272238, 1741, 666448, 12002, 848378, 836660, 26591, 72466, 2]
HF    ids: [0, 272238, 1741, 12002, 666448, 848378, 836660, 26591, 72466, 2]
------------------------------------------------------------------------------------------
Mismatch for sr sentence:
Прерасподела доходка , Економски факултет Београд USJF - Preraspodela dohotka.ppt‎
XLM-V ids: [0, 188107, 189047, 187172, 192298, 190169, 186948, 6, 4, 228329, 186887, 192995, 190449, 15373, 662660, 20, 1182, 120, 793095, 567795, 656994, 90130, 5, 457258, 2]
HF    ids: [0, 188107, 189047, 187172, 192298, 190169, 186948, 6, 4, 228329, 186887, 192995, 190449, 15373, 662660, 20, 1182, 120, 793095, 567795, 656994, 90130, 5, 457258, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for te sentence:
దారిమార్పు ఇండియన్‌ ఇన్‌స్టిట్యూట్‌ ఆఫ్‌ టెక్నాలజీ మద్రాస్‌
XLM-V ids: [0, 436137, 464065, 387183, 460474, 400919, 520935, 493353, 384438, 397587, 466836, 385426, 480198, 383019, 2]
HF    ids: [0, 436137, 464065, 387183, 460474, 400919, 520935, 493353, 384438, 397587, 466836, 385426, 480198, 383019, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for ur sentence:
جاوید شیخ -‎ ‎جاوید ‏‎ ‎
XLM-V ids: [0, 408290, 389645, 20, 408290, 2]
HF    ids: [0, 408290, 389645, 20, 408290, 6, 2]
------------------------------------------------------------------------------------------
Mismatch for uz sentence:
Dastlab Oltin Oʻrdattt asosiy siyosiy markazi hisoblangan .
XLM-V ids: [0, 61568, 14, 3181, 586435, 43, 122, 1476, 47569, 211172, 14, 15966, 43523, 22564, 42030, 7050, 6, 5, 2]
HF    ids: [0, 61568, 14, 3181, 586435, 43, 122, 1476, 47569, 14, 211172, 15966, 43523, 22564, 42030, 7050, 6, 5, 2]
------------------------------------------------------------------------------------------
Mismatch for zh-yue sentence:
R E D I R E C T # 巴 菲 特 ‎
XLM-V ids: [0, 266, 181, 205, 168, 266, 181, 232, 157, 524, 335519, 6, 286994, 6, 283738, 2]
HF    ids: [0, 266, 181, 205, 168, 266, 181, 232, 157, 524, 335519, 6, 286994, 6, 283738, 6, 2]
------------------------------------------------------------------------------------------
stefan-it commented 1 year ago

Can we tolerate these mismatches :thinking:

stefan-it commented 1 year ago

Model is up now on the model hub:

https://huggingface.co/stefan-it/xlm-v-base

-> I would like to conduct some experiments on downstream tasks (mainly NER) to measure performance.

Maybe e.g. @mrm8488 also wants to fine-tune models so that we can try to reproduce some of the paper results :)

After some experiments I can transfer the model to the Meta AI organization. The MLM performance is really good, so the model should work:

In [3]: unmasker("Paris is the <mask> of France.")
Out[3]: 
[{'score': 0.9286897778511047,
  'token': 133852,
  'token_str': 'capital',
  'sequence': 'Paris is the capital of France.'},
 {'score': 0.018073994666337967,
  'token': 46562,
  'token_str': 'Capital',
  'sequence': 'Paris is the Capital of France.'},
 {'score': 0.013238662853837013,
  'token': 8696,
  'token_str': 'centre',
  'sequence': 'Paris is the centre of France.'},
 {'score': 0.010450296103954315,
  'token': 550136,
  'token_str': 'heart',
  'sequence': 'Paris is the heart of France.'},
 {'score': 0.005028395913541317,
  'token': 60041,
  'token_str': 'center',
  'sequence': 'Paris is the center of France.'}]
mrm8488 commented 1 year ago

Thank you so much @stefan-it. Ofc, I will try to reproduce some of the reported results.

stefan-it commented 1 year ago

I've replicated the MasakhaNER v1 results from the paper:

I fine-tuned 5 models (with different seeds) on the English WikiANN (Rahimi split) and evaluated them on MasakhaNER v1. Note: DATE entities do not exist in WikiANN, so they were replaced with O for zero-shot evaluation. I averaged F1-Score over the 5 models to get the final score. Models were fine-tuned with a sequence length of 512 (paper uses 128, I recognized this after fine-tuning experiments), but other hyper-parameter are the same as used in XLM-V paper: Batch size is 32, learning rate 2e-05 and number of epochs is 10.

Putting it all together (see Table 11 in XLM-V paper):

Model amh hau ibo kin lug luo pcm swa wol yor Avg.
XLM-R (Paper) 25.1 43.5 11.6 9.4 9.5 8.4 36.8 48.9 5.3 10.0 20.9
XLM-R (Reproduced) 27.1 42.4 14.2 12.4 14.3 10.0 40.6 50.2 6.3 11.5 22.9
XLM-V (Paper) 20.6 35.9 45.9 25.0 48.7 10.4 38.2 44.0 16.7 35.8 32.1
XLM-V (Reproduced) 25.3 45.7 55.6 33.2 56.1 16.5 40.7 50.8 26.3 47.2 39.7

Performance diff for WikiANN between XLM-R and XLM-V in the paper is 11.2%. Reproduced experiments gave an performance diff of 16.8%.

So I think these experiments show, that the model is working and it achieves great results on MasakhaNER v1!

I will set-up a repository for all these results and conduct more experiments on WikiANN (second NER downstream tasks that is mentioned in in the paper).

@patrickvonplaten Do you think the model is then ready to be moved to the Meta AI org? I've also written an initial model card.

stefan-it commented 1 year ago

Here's the comparison on WikiANN zero-shot (see Table10 in XLM-V paper):

Model ro gu pa lt az uk pl qu hu fi et tr kk zh my yo sw
XLM-R (Paper) 73.5 62.9 53.6 72.7 61.0 72.4 77.5 60.4 75.8 74.4 71.2 75.4 42.2 25.3 48.9 33.6 66.3
XLM-R (Reproduced) 73.8 65.5 50.6 74.3 64.0 76.5 78.4 60.8 77.7 75.9 73.0 76.4 45.2 29.8 52.3 37.6 67.0
XLM-V (Paper) 73.8 66.4 48.7 75.6 66.7 65.7 79.5 70.0 79.5 78.7 75.0 77.3 50.4 30.2 61.5 54.2 72.4
XLM-V (Reproduced) 77.2 65.4 53.6 74.9 66.0 69.4 79.8 66.9 79.0 77.9 76.2 76.8 48.5 28.1 58.4 62.6 71.6
Model th ko ka ja ru bg es pt it fr fa ur mr hi bn el de
XLM-R (Paper) 5.2 49.4 65.4 21.0 63.1 76.1 70.2 77.0 76.9 76.5 44.6 51.4 61.5 67.2 69.0 73.8 74.4
XLM-R (Reproduced) 4.7 49.4 67.5 21.9 65.2 77.5 76.7 79.0 77.7 77.9 49.0 55.1 61.3 67.8 69.6 74.1 75.4
XLM-V (Paper) 3.3 53.0 69.5 22.4 68.1 79.8 74.5 80.5 78.7 77.6 50.6 48.9 59.8 67.3 72.6 76.7 76.8
XLM-V (Reproduced) 2.6 51.6 71.2 20.6 67.8 79.4 76.2 79.9 79.5 77.5 51.7 51.5 61.9 69.2 73.2 75.9 77.1
Model en nl af te ta ml eu tl ms jv id vi he ar Avg.
XLM-R (Paper) 83.0 80.0 75.8 49.2 56.3 61.9 57.2 69.8 68.3 59.4 48.6 67.7 53.2 43.8 61.3
XLM-R (Reproduced) 83.4 80.8 75.8 49.3 56.8 62.2 59.1 72.2 62.3 58.3 50.0 67.9 52.6 47.8 62.6
XLM-V (Paper) 83.4 81.4 78.3 51.8 54.9 63.1 67.1 75.6 70.0 67.5 52.6 67.1 60.1 45.8 64.7
XLM-V (Reproduced) 84.1 81.3 78.9 50.9 55.9 63.0 65.7 75.9 70.8 64.8 53.9 69.6 61.1 47.2 65.0

Diff. between XLM-V and XLM-R in the paper: (64.7 - 61.3) = 3.4%. Diff. between reproduced XLM-V and XLM-R: (65.0 - 62.6) = 2.4%.

Same conclusion: the converted/integrated XLM-V works great :hugs:

mrm8488 commented 1 year ago

Great job @stefan-it !!! 🔥

stefan-it commented 1 year ago

Thanks @mrm8488 !

Repo is btw: up here: https://github.com/stefan-it/xlm-v-experiments :)

NielsRogge commented 1 year ago

Thanks a lot for your contribution @stefan-it 🙏

Just transferred the checkpoint to the appropriate organization: https://huggingface.co/facebook/xlm-v-base

However, I feel like it could be beneficial to have a separate model_doc for XLM-V (similar to how we did this for T5v1.1 etc.).

Do you mind opening a PR for that?

NielsRogge commented 1 year ago

Thanks! Closing this issue as the model is now available: https://huggingface.co/docs/transformers/main/en/model_doc/xlm-v.

patrickvonplaten commented 1 year ago

Amazing work @stefan-it - thanks a lot!

mrm8488 commented 1 year ago

Amazing @stefan-it . Should I add some ft metric @patrickvonplaten as done for other models? I fine-tuned it on XNLI: https://huggingface.co/mrm8488/xlm-v-base-finetuned-xglue-xnli