rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

Add parallel support (--num-workers) #94

Closed yimmon closed 4 years ago

yimmon commented 4 years ago

Add an option --num-workers for apply-bpe, learn-bpe and learn-joint-bpe-and-vocab to support parallel mode, which is useful for large datasets. (e.g. 6+ times speedup for apply-bpe, and 2 times speedup for learn-joint-bpe-and-vocab on WMT15 en-fr)

Two limitations:

  1. Parallel mode not supports STDIN, since it can't be read randomly. Instead, use --input \<FILE>.
  2. Only implement and test for Python 3.
rsennrich commented 4 years ago

thanks, this looks good!