Closed yaysummeriscoming closed 3 years ago
@LysandreJik any update on this?
@yaysummeriscoming To get sub words instead of numbers, you can call tokenizer.gpt2_tokenizer.decode(tokens)
. Please take a look at our code for reference.
That did the trick, thanks!
Environment info
transformers
version: 4.0.0Who can help
@BigBird01 @LysandreJik
Information
I'd like to use the new deberta model, but it seems that the tokens aren't mapped correctly?
Roberta output is: ['hello', ',', 'ĠI', 'Ġam', 'Ġa', 'Ġdog'] Deberta output is: ['31373', '11', '314', '716', '257', '3290']
I'd expect deberta to give an output similar to roberta, rather than numbers.