masakhane-io / masakhane-reading-group

Agile reading group that works
13 stars 1 forks source link

[21/05/2020] 5:15PM GMT+1 : MultiFiT: Efficient Multi-lingual Language Model Fine-tuning #3

Closed keleog closed 4 years ago

keleog commented 4 years ago

Link: https://www.aclweb.org/anthology/D19-1572.pdf

Short Description:

The authors propose Multi-lingual language model Fine-Tuning (MultiFiT) to enable people to train and fine-tune language models efficiently on their languages, particularly low-resourced ones. They also introduce a zero-shot method for existing pretrained models.

hadyelsahar commented 4 years ago

emnlp2019 presentation: https://www.youtube.com/watch?v=KiwwCUosw7E

keleog commented 4 years ago

Hello
 From hady elsahar to Everyone: (05:15 PM) 
Hi all :) hope you are doing well
keletchi can you open sharing screen
 From Me to Everyone: (05:21 PM) 
Sure
 From hady elsahar to Everyone: (05:21 PM) 
thanks thanks :)
 From Me to Everyone: (05:35 PM) 
https://github.com/n-waves/multifit
 From hady elsahar to Everyone: (05:35 PM) 
https://github.com/n-waves/multifit
its in fast ai i think
Welcome Thierno:)
you are welcome without reading :)
 From Me to Everyone: (05:38 PM) 
https://arxiv.org/pdf/2005.09093.pdf
 From hady elsahar to Everyone: (05:42 PM) 
ruder talks about negative transfer here in this blog post https://ruder.io/multi-task/
 From Me to Everyone: (05:42 PM) 
Anyone with questions can post them here btw? Would be nice if you asked the questions verbally though. :)
My question - in what scenarios are LSTMs better than Transformers?
 From Me to Everyone: (05:43 PM) 
This is interesting in that regard btw - https://twitter.com/srush_nlp/status/1245825437240102913
 From Jamiil Toure ALI to Everyone: (05:43 PM) 
i was having a problem with some of the terminology .. what is zero shot ? what is cross lingual model ?
 From Me to Everyone: (05:43 PM) 
Zero shot - meaning making predictions on a “domain” you did not train on
Domain - eg. Labels, languages, etc.
 From Jamiil Toure ALI to Everyone: (05:44 PM) 
thanks @kelechi
how about cross lingual ?
 From Me to Everyone: (05:45 PM) 
Cross-lingual - shared amongst languages. Typically used to refer to a model that works well for different languages. Although, it is often conflated with multi-lingual, I believe. So it could be confusing.
 From Me to Everyone: (05:46 PM) 
Eg. Cross lingual word vectors - are vector spaces that are shared among languages. So, you could query a vector similar to a target language from a source language, all in the same vector space.
Maybe someone can explain better.
 From Ignatius Ezeani to Everyone: (05:50 PM) 
On 'multi-lingual' and 'cross-lingual' see footnote 5 highlighted in the paper on the shared screen for some explanations.
 From hady elsahar to Everyone: (05:52 PM) 
very good question
 From Jamiil Toure ALI to Everyone: (05:54 PM) 
thanks @kelechi and @ignatius for the answers.
 From hady elsahar to Everyone: (05:56 PM) 
peters 2018 b paper: https://arxiv.org/pdf/1808.08949v1.pdf
 From Brian Muhia to Everyone: (05:58 PM) 
I was incorrect, they do discuss Transformers later.
 From hady elsahar to Everyone: (05:59 PM) 
agree with kelechi + giving them the same the hyper param tuning budget
 From Allen Akinkunle to Everyone: (06:00 PM) 
Thank you for organising this, Kelechi. And thank you everyone for chipping in. I learnt a lot. I have to jump off now
 From Brian Muhia to Everyone: (06:00 PM) 
weight pruning as a test of hyperparam usage?
 From hady elsahar to Everyone: (06:01 PM) 
weight pruning has kind of similar objective to distillation , some pruned models actually have better results than the orig. model because also of regularization effect
 From hady elsahar to Everyone: (06:04 PM) 
thanks kelechi for the "Are All Languages Created Equal in Multilingual BERT?" want to read it more now
whats the name of it again Brian ?
 From Brian Muhia to Everyone: (06:05 PM) 
https://arxiv.org/abs/1911.11423
 From hady elsahar to Everyone: (06:07 PM) 
imo DL models are like mobile phones some are popular but people pick whatever suits their needs
I do agree
Ihave to go guys another meeting in 5 mins
was lovely chatting to you all :) like always
see you next week


hadyelsahar commented 4 years ago

Interesting paper, proposes Multi-fit to avoiding training / finetuning of large cross lingual language models on multiple languages and perform zero shot language transfer.

For a task on a new language say (zh)

Paper argues that the resulting model is superior to the cross lingual model because of the LM on the target language.

IgnatiusEzeani commented 4 years ago

MultiFiT - Efficient Multi-lingual Language Model Fine-tuning Abstract: "We propose Multi-lingual language model Fine-Tuning (MultiFiT) to enable practitioners to train and fine-tune language models 'efficiently' in 'their own language'"

Reason:

  1. Languages that are less frequently seen during training are underrepresented in the embedding space
  2. Infrequent scripts are over-segmented in the shared word piece vocabulary

Multi-lingual vs Cross-lingual:

Method and Model Architecture:

This work:

Tasks and Datasets

Results