rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

Add --seed command-line argument to apply-bpe #91

Closed noe closed 4 years ago

noe commented 4 years ago

This PR is motivated by issue https://github.com/rsennrich/subword-nmt/issues/90 :

Given that BPE dropout introduces randomness in the result of apply-bpe, it would be great if there was a --seed command-line argument that made it reproducible.

A new optional command-line argument --seed is added to apply-bpe. If the argument is provided, the random seed is set in main function before calling bpe.process_line.

If subword-nmt is used programmatically, it is possible to directly set the random variable there, before invoking the BPE object, so there is no need for special changes to support setting the random seed when used programmatically. Therefore, only the command-line argument was added in this PR.

An alternative solution that would be perhaps more suited for programmatic use of subword-bpe would be to change the function definitions that receive a dropout parameter to also receive a seed argument and then use a local Random object instead of the shared-state global one, but this would complicate things unnecessarily.