Closed Chen1399 closed 3 years ago
I'm not sure I understand what the problem is. Could you provide an example with a clear input and expected/obtained outputs? Please also check if the issue still exists in https://github.com/marian-nmt/marian-dev, the SentencePiece there has been updated recently.
There is a bug in USE_SENTENCEPIECE, when line encode(sentencepiece_vacab.cpp). Encoding from token to id has bug, beacause the id is in vocab of spm file which isn't vocab.yml. The id is error. It should be encode to string. Then the string map to id by defaultVocab which from vocab.yml. I'm not good at English. I hope you can understand