Add parallel support (--num-workers)

rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

MIT License

2.18k stars 464 forks source link

Add parallel support (--num-workers) #94

Closed yimmon closed 4 years ago

yimmon commented 4 years ago

Add an option --num-workers for apply-bpe, learn-bpe and learn-joint-bpe-and-vocab to support parallel mode, which is useful for large datasets. (e.g. 6+ times speedup for apply-bpe, and 2 times speedup for learn-joint-bpe-and-vocab on WMT15 en-fr)

Two limitations:

Parallel mode not supports STDIN, since it can't be read randomly. Instead, use --input \<FILE>.
Only implement and test for Python 3.

rsennrich commented 4 years ago

thanks, this looks good!