rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.19k stars 465 forks source link

Problem with a large corpus #77

Closed nguyenvulebinh closed 5 years ago

nguyenvulebinh commented 5 years ago

I have a large corpus, around 40GB of text. I install subword-nmt via pip and try to make the dictionary with subword-nmt command line and it takes forever to finish. I just wonder whether there any solution for that situation?

rsennrich commented 5 years ago

here's some suggestions:

nguyenvulebinh commented 5 years ago

Really appreciate that! Thank you for your help! I'll try it.