Confusion about the amount of monolingual data used in the experiments

Hi,

I would like to ask what is the amount of monolingual data used in each experiment.

In the paper, as well as in this issue, you mention that you use...

... all of the monolingual data from WMT News Crawl datasets, which covers 190M, 62M and 270M sentences from the year 2007 to 2017 for English, French, German respectively.
On these issues: https://github.com/microsoft/MASS/issues/32#issuecomment-549147762 https://github.com/microsoft/MASS/issues/62#issuecomment-589016149, you mention that you use a subsample (50 million sentences) of the full data.
1. In get-data-nmt.sh I see that you have commented out the download links to the News Crawl data from many years for each language.

I may have missed something or misread the issues, but I am confused about how much data you actually used. I would appreciate it if you helped clear my confusion.

Thanks!

microsoft / MASS

Confusion about the amount of monolingual data used in the experiments #160