Closed keleog closed 4 years ago
emnlp2019 presentation: https://www.youtube.com/watch?v=KiwwCUosw7E
Hello From hady elsahar to Everyone: (05:15 PM) Hi all :) hope you are doing well keletchi can you open sharing screen From Me to Everyone: (05:21 PM) Sure From hady elsahar to Everyone: (05:21 PM) thanks thanks :) From Me to Everyone: (05:35 PM) https://github.com/n-waves/multifit From hady elsahar to Everyone: (05:35 PM) https://github.com/n-waves/multifit its in fast ai i think Welcome Thierno:) you are welcome without reading :) From Me to Everyone: (05:38 PM) https://arxiv.org/pdf/2005.09093.pdf From hady elsahar to Everyone: (05:42 PM) ruder talks about negative transfer here in this blog post https://ruder.io/multi-task/ From Me to Everyone: (05:42 PM) Anyone with questions can post them here btw? Would be nice if you asked the questions verbally though. :) My question - in what scenarios are LSTMs better than Transformers? From Me to Everyone: (05:43 PM) This is interesting in that regard btw - https://twitter.com/srush_nlp/status/1245825437240102913 From Jamiil Toure ALI to Everyone: (05:43 PM) i was having a problem with some of the terminology .. what is zero shot ? what is cross lingual model ? From Me to Everyone: (05:43 PM) Zero shot - meaning making predictions on a “domain” you did not train on Domain - eg. Labels, languages, etc. From Jamiil Toure ALI to Everyone: (05:44 PM) thanks @kelechi how about cross lingual ? From Me to Everyone: (05:45 PM) Cross-lingual - shared amongst languages. Typically used to refer to a model that works well for different languages. Although, it is often conflated with multi-lingual, I believe. So it could be confusing. From Me to Everyone: (05:46 PM) Eg. Cross lingual word vectors - are vector spaces that are shared among languages. So, you could query a vector similar to a target language from a source language, all in the same vector space. Maybe someone can explain better. From Ignatius Ezeani to Everyone: (05:50 PM) On 'multi-lingual' and 'cross-lingual' see footnote 5 highlighted in the paper on the shared screen for some explanations. From hady elsahar to Everyone: (05:52 PM) very good question From Jamiil Toure ALI to Everyone: (05:54 PM) thanks @kelechi and @ignatius for the answers. From hady elsahar to Everyone: (05:56 PM) peters 2018 b paper: https://arxiv.org/pdf/1808.08949v1.pdf From Brian Muhia to Everyone: (05:58 PM) I was incorrect, they do discuss Transformers later. From hady elsahar to Everyone: (05:59 PM) agree with kelechi + giving them the same the hyper param tuning budget From Allen Akinkunle to Everyone: (06:00 PM) Thank you for organising this, Kelechi. And thank you everyone for chipping in. I learnt a lot. I have to jump off now From Brian Muhia to Everyone: (06:00 PM) weight pruning as a test of hyperparam usage? From hady elsahar to Everyone: (06:01 PM) weight pruning has kind of similar objective to distillation , some pruned models actually have better results than the orig. model because also of regularization effect From hady elsahar to Everyone: (06:04 PM) thanks kelechi for the "Are All Languages Created Equal in Multilingual BERT?" want to read it more now whats the name of it again Brian ? From Brian Muhia to Everyone: (06:05 PM) https://arxiv.org/abs/1911.11423 From hady elsahar to Everyone: (06:07 PM) imo DL models are like mobile phones some are popular but people pick whatever suits their needs I do agree Ihave to go guys another meeting in 5 mins was lovely chatting to you all :) like always see you next week
Interesting paper, proposes Multi-fit to avoiding training / finetuning of large cross lingual language models on multiple languages and perform zero shot language transfer.
For a task on a new language say (zh)
Paper argues that the resulting model is superior to the cross lingual model because of the LM on the target language.
MultiFiT - Efficient Multi-lingual Language Model Fine-tuning Abstract: "We propose Multi-lingual language model Fine-Tuning (MultiFiT) to enable practitioners to train and fine-tune language models 'efficiently' in 'their own language'"
Reason:
Multi-lingual vs Cross-lingual:
Method and Model Architecture:
This work:
Tasks and Datasets
Results
Link: https://www.aclweb.org/anthology/D19-1572.pdf
Short Description: