Closed ramild closed 5 years ago
UPD. I tried it with bert-multilingual-cased, but the results are still bad. A number of very simple (text, translated text) give very different probability distributions (the translated versions almost always fall into one major category).
Specifiically, I fine-tune pre-trained bert-multilingual-cased on Russian text classification problem and then make a prediction using the model on an English text (tried other languages -- nothing works).
Hi, my feeling is that this is still an open research problem.
Here is a recent thread discussing the related problem of fine-tuning BERT on English SQuAD and trying to do QA in another language. Maybe you can get a pre-print from the RecitalAI guys if they haven't published it yet.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi! I'm interested in solving a classification problem in which I train the model on one language and make the predictions for another one (zero-shot classification).
It is said in the README for the multilingual BERT model (https://github.com/google-research/bert/blob/master/multilingual.md) that:
But after finetuning the BERT-multilingual-uncased for one language dataset, it absolutely doesn't work for the texts in another languages. Predictions turn out to be inadequate: I tried multiple pairs
(text, the same text translated to another language)
and probability distributions over labels (after apply softmax) were wildly different.Do you know what can be the cause of the problem? Should I somehow change the tokenization when applying the model to other languages (BPE embeddings are shared, so not sure about this one)? Or should I use multilingual-cased instead of multilingual-uncased (is it possible it can be the source of the problem)?