Closed Hannibal046 closed 2 years ago
And what is the real vocabulary for DL model training ? Should I use vocabulary file generate by subword nmt
by taking each line as a vocabulary term ? Or should I use bped file and use space to manually create vocabulary ?
And if I use bped file to get my vocabulary by SPACE, I don't know why <UNK>
token is necessary here. Sorry for taking your time.
Hi, I am confused about the usage of
subword-nmt learn-joint-bpe-and-vocab
. What is the edge case using joint-bpe ? Since all words will bedebpe
at test time. Why this could produce unknown words ? Thanks for answering.