pcyin / pytorch_nmt

A neural machine translation model in PyTorch
117 stars 25 forks source link

dataset preprocessing #2

Closed pum-purum-pum-pum closed 6 years ago

pum-purum-pum-pum commented 6 years ago

Can you please tell which preprocessing did you use? I found that original IWSLT consist of some xml files. Thank you!

pcyin commented 6 years ago

Hi! For the training set, I replaced all singletons with <unk>. For dev/test sets they are the same as the official release :)

pum-purum-pum-pum commented 6 years ago

Thank you, again. And you just somehow concatenated all these xml files for dev/test. I'm looking at this one: https://wit3.fbk.eu/mt.php?release=2014-01

pcyin commented 6 years ago

I used the data preprocessing scripts at https://github.com/harvardnlp/BSO/tree/master/data_prep. Hope this helps!

pum-purum-pum-pum commented 6 years ago

I tried it and it just works :) Thank you so much!