Closed steremma closed 3 years ago
Yes, this part is really confusing, as the the effect of parameter "alpha" is opposite in BPE and unigram.
Unigram:
BPE:
We will update the comment and the expected range of alpha. Anyway, as long as setting 0 < alpha < 1. BPE-drop will work as expected.
Thanks for the quick response, I think this makes sense. One tiny detail: should it perhaps be called dropout prob
instead of merge prob
? When alpha = 0
we have normal BPE meaning nothing is dropped and every merge does happen, so merge prob
would be 1.0
.
Updated the document and behavior in v0.1.94. Now alpha=0 or 1.0 is accepted in BPE mode.
I am attempting to use the BPE-dropout feature from the either the command line or the python API. I show the python example because
spm_encode
doesn't support dropout-bpe at all based onspm_encode --help
.Im using version
0.1.93
.I start by making my BPE model. There was no mention of dropout in
spm_train --help
so I assume we don't have to specify anything here.We will now use this model to tokenise stochastically from Python. Looking at
help(s.encode)
I readalpha: Soothing parameter for unigram sampling, and merge probability for BPE-dropout.
The parameter seems to have no effect unless we setenable_sampling=True
, for example:When we do set it, I expected that
alpha=1.0
would yield character segmentation (always drop-out) whilealpha=0.0
would yield deterministic BPE. (Actually the documentation mentionsmerge probability
notdropout probability
so I would expect the opposite but let's assume there was a misspelling in the doc).In any case while
alpha = 1.0
works,alpha = 0.0
doesn't. The function seems to be unaware that my model is BPE and not Kudo's LM based on the error message:The error message is obviously false (we were able to use
enable_sampling=True
without specifyingnbest_size
and withalpha==1.0
in the call right above.So all in all, how should one do BPE-dropout?