Given that BPE dropout introduces randomness in the result of apply-bpe, it would be great if there was a --seed command-line argument that made it reproducible.
A new optional command-line argument --seed is added to apply-bpe. If the argument is provided, the random seed is set in main function before calling bpe.process_line.
If subword-nmt is used programmatically, it is possible to directly set the random variable there, before invoking the BPE object, so there is no need for special changes to support setting the random seed when used programmatically. Therefore, only the command-line argument was added in this PR.
An alternative solution that would be perhaps more suited for programmatic use of subword-bpe would be to change the function definitions that receive a dropout parameter to also receive a seed argument and then use a local Random object instead of the shared-state global one, but this would complicate things unnecessarily.
This PR is motivated by issue https://github.com/rsennrich/subword-nmt/issues/90 :
A new optional command-line argument
--seed
is added to apply-bpe. If the argument is provided, the random seed is set in main function before callingbpe.process_line
.If subword-nmt is used programmatically, it is possible to directly set the random variable there, before invoking the
BPE
object, so there is no need for special changes to support setting the random seed when used programmatically. Therefore, only the command-line argument was added in this PR.An alternative solution that would be perhaps more suited for programmatic use of subword-bpe would be to change the function definitions that receive a
dropout
parameter to also receive aseed
argument and then use a localRandom
object instead of the shared-state global one, but this would complicate things unnecessarily.