openai / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
https://openai.com/blog/better-language-models/
Other
22.46k stars 5.52k forks source link

Translation task #20

Closed djstrong closed 5 years ago

djstrong commented 5 years ago

What was the format for translation task? Do you provide sequence of pairs delimited by new lines, e.g. "sentence1 = translation_of_sentence1 \n sentence2 = translation_of_sentence2 \n ... \n testing_sentence = "? Does the training dataset consist of similar format translations?

WuTheFWasThat commented 5 years ago

format is correct!

RE training dataset - see paper for details!

djstrong commented 5 years ago

I have read the paper, my question is: the ability to provide translation for a sentence comes from training data (there were some texts with similar translation pair format) or something else? In other words, if I take two totally monolingual corpora (English and French) and train your model, will be the translation working?

marcoaleixo commented 5 years ago

@djstrong any news about that?