Closed fotwo closed 6 years ago
Hi,
the vocab for both English and Vietnamese can be found using archive.org:
and
Just skip the Javascript Code at the beginning, then the whole vocabulary is show :)
To manually create such a vocabulary you could use the tokenized data and build a frequency list + adding the tokens <unk>
, <s>
and </s>
.
I hope this helps you :)
Hi, Thanks a lot for help, it works.
The Stanford site is working again, so I'm closing here :)
Hi, Thanks for sharing this dataset. Could you give me the vocab or any other ways to get the vocab? Because it seems the website 'https://nlp.stanford.edu/projects/nmt/' is not accessible. Thanks a lots!