Closed keleog closed 4 years ago
My Personal Notes:
Basic Algorithm:
Outcomes
Missing knowledge:
Missing context:
Relevance/Value
Some questions:
Found some answers: Q1 and Q2:
Yes - Trivial Transfer Learning for LR NMT (https://arxiv.org/pdf/1809.00357.pdf). The authors leave out the restriction on relatedness of the languages and extend the experiments to parent–child pairs where the target language changes. They find that even for unrelated languages, there is some transfer improvement. They conclude that it is size of training data of parent pair that matters more than similarity.
In fact, they swapped the directions of the parent and child pairs and still observed gains. Eg. XX - EN parent and EN-YY pairs. Also, the method works also for sharing the source language, not just the target language. However, embeddings were shared in the source and target directions for both parent and child, so another possible reason for transfer, which they did not investigate. Also, they find some gains when they train from source to target as parent and then transfer that from target to source as child. Eg. Train an English to Zulu NMT model and initialise the training of a Zulu to English NMT model with the previously trained English to Zulu NMT.
@keleog To your #3: The Urdu dataset has 200k sentences only - which is more reflective of other african langauges
@jaderabbit Fair enough. Assuming the average number of tokens per sentence of 15, that is around 3 million tokens. But, honestly, I have not seen a parallel set of up to 100k sentences for any African language except the Setswana one reported in your one of your papers.
@jaderabbit Your missing knowledge parts are very important and I hope someone has an idea about them. I think the ensemble would be different "Xfer" models with varying parent and child and hyperparameters.
Saved chat and links from our talks:
18:17:20 From Bernardt Duvenhage : Hi. I'm in a bit of a noisy environment today. 18:17:38 From Bernardt Duvenhage : Will mostly listen in while I try to homeschool my kids :-) 18:22:00 From hady elsahar : +1 for some comments after Kelechi 18:24:57 From hady elsahar :
mikel Artetxe https://scholar.google.com/citations?hl=en&user=N5InzP8AAAAJ&view_op=list_works&sortby=pubdate http://www.mikelartetxe.com/publication/ https://www.cse.ust.hk/~qyang/Docs/2009/tkde_transfer_learning.pdf
18:25:53 From hady elsahar : Shared BPE vocabulary https://www.aclweb.org/anthology/P16-1162/ https://github.com/google/sentencepiece
18:28:57 From kelechukwu : Trivial Transfer Learning for LR NMT (https://arxiv.org/pdf/1809.00357.pdf) 18:29:48 From Jade Abbott : https://iclr.cc/virtual/poster_S1l-C0NtwS.html 18:30:28 From Jade Abbott : https://openreview.net/pdf?id=S1l-C0NtwS 18:31:10 From Bernardt Duvenhage : Read it briefly. Are the child models always the same size as the teacher model? 18:32:33 From kelechukwu : Same size - like embedding and layer dimensions, etc. ? 18:34:06 From orevaogheneahia : I have skimmed through the paper. I was wondering how to properly select the appropriate parent languages. 18:34:31 From hady elsahar : bahdnau 2015 18:34:33 From orevaogheneahia : More like how do we measure the similarity ? 18:35:20 From Jamiil Toure ALI : Hi all . I read the paper... And i didn't understand the re-correction part ? how is that implemented on the paper ? 18:35:54 From Bernardt Duvenhage : Thanks. It would be cool to see the papers on the benefit of also incorporating distillation. 18:36:21 From Jamiil Toure ALI : sorry re-scoring rahte han re-correction .. 18:39:13 From hady elsahar : https://github.com/google-research/bert/blob/master/multilingual.md 18:40:08 From kelechukwu : Multilingual Denoising Pre-training for Neural Machine Translation - https://arxiv.org/abs/2001.08210 18:40:32 From kelechukwu : Multilingual BART seems to perform well for LM transfer learning to NMT 18:46:44 From hady elsahar : https://www.aclweb.org/anthology/P19-1301.pdf 18:48:05 From orevaogheneahia : Thanks for sharing . 18:53:27 From hady elsahar : Reranking diverse candidates has been shown to improve results in both open dialog and machine translation (Li et al., 2016a; Li and Jurafsky, 2016; Gimpel et al., 2013) 18:53:32 From hady elsahar : https://www.aclweb.org/anthology/P19-1365.pdf 18:54:12 From Jamiil Toure ALI : Thanks for sharing 18:54:29 From Bernardt Duvenhage : When will next week's paper be announced? 18:55:48 From Bernardt Duvenhage : Very cool idea, yes :-) Thanks
Thanks @hadyelsahar !
Link - https://www.isi.edu/natural-language/mt/emnlp16-transfer.pdf
Summary: