Closed samin9796 closed 2 years ago
Please elaborate what you tried e.g., command line flags, and/or python code (if you are using python module)
The example is in README.md: Subword regularization and BPE-dropout Subword regularization [Kudo.] and BPE-dropout Provilkov et al are simple regularization methods that virtually augment training data with on-the-fly subword sampling, which helps to improve the accuracy as well as robustness of NMT models.
To enable subword regularization, you would like to integrate SentencePiece library (C++/Python) into the NMT system to sample one segmentation for each parameter update, which is different from the standard off-line data preparations. Here's the example of Python library. You can find that 'New York' is segmented differently on each SampleEncode (C++) or encode with enable_sampling=True (Python) calls. The details of sampling parameters are found in sentencepiece_processor.h.
import sentencepiece as spm s = spm.SentencePieceProcessor(model_file='spm.model') for n in range(5): ... s.encode('New York', out_type=str, enable_sampling=True, alpha=0.1, nbest_size=-1) ... ['▁', 'N', 'e', 'w', '▁York'] ['▁', 'New', '▁York'] ['▁', 'New', '▁Y', 'o', 'r', 'k'] ['▁', 'New', '▁York'] ['▁', 'New', '▁York']
Please let me close this issue since there seems to be no further discussion
How can I use BPE-Dropout? I don't see any changes if I try out different alpha values for BPE model.