n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
282 stars 56 forks source link

test model on other languages #56

Open wonderfultina opened 4 years ago

wonderfultina commented 4 years ago

Hi, I want to ask a question ,if i want to use model that trained in English,and use it to test other languages.How do I run the code?

PiotrCzapla commented 4 years ago

Do you mean in form of zeroshoot transfer learning?

If so we use Laser for that. First we train laser , to obtain zeroshoot predictions for other languages. Then we use that zershoot predictions to train regular multifit (pretrained in the language that we are testing on). The unsupervised pretraining removes noise from the laser zeroshoot predictions and improves the results.

wonderfultina commented 4 years ago

I understand now,thank you.

vhargitai commented 4 years ago

Hi @PiotrCzapla , have you or your colleagues already pretrained this model on English Wikipedia?

If not, would using prepare_wiki-en.sh to grab wikitext-103, then running postprocess_wikitext.py on it be identical to the dataset preparation you did for other languages in the MultiFiT paper?

I'd like to reproduce the monolingual supervised training procedure in the MultiFiT paper for English language classification. Thanks in advance!

mhajiaghayi commented 4 years ago

Do you mean in form of zeroshoot transfer learning? If so we use Laser for that. First we train laser , to obtain zeroshoot predictions for other languages. Then we use that zershoot predictions to train regular multifit (pretrained in the language that we are testing on). The unsupervised pretraining removes noise from the laser zeroshoot predictions and improves the results.

Q) in this case, you don't have a single model with the fixed tokenization that does zero-shot embedding for other language. am I right?

iNeil77 commented 3 years ago

Do you mean in form of zeroshoot transfer learning?

If so we use Laser for that. First we train laser , to obtain zeroshoot predictions for other languages. Then we use that zershoot predictions to train regular multifit (pretrained in the language that we are testing on). The unsupervised pretraining removes noise from the laser zeroshoot predictions and improves the results.

In the CLS-DE notebook I only see the classifier fine-tuning happening with DE Music Data, Label pairs. But if I understand what you said correctly, shouldn't the LASER classifier be fine-tuned with EN Music Data first before it can act as a teacher to fine-tune the DE Classifier? I don't see that in the notebook. Am I misunderstanding the training regime?