Closed KaiserWhoLearns closed 3 years ago
Hi there!
GPT-2 and RoBERTa both use byte-level Byte-Pair-Encoding for tokenization but they are different tokenizers, trained on different datasets with different vocab_size
. So this is not a bug. They share the tokenization method but are essentially different tokenizers.
Also, please use the forum for such questions. Thank you :)
Hi there! GPT-2 and RoBERTa both use byte-level Byte-Pair-Encoding for tokenization but they are different tokenizers, trained on different datasets with different
vocab_size
. So this is not a bug. They share the tokenization method but are essentially different tokenizers.Also, please use the forum for such questions. Thank you :)
I see, thank you very much! I am confused a bit because of seeing paper like these (https://aclanthology.org/2020.tacl-1.18.pdf, https://aclanthology.org/2020.emnlp-main.344.pdf) using GPT2 and RoBERTa since they share the vocab..
Environment info
transformers
version: 4.5.1Who can help
Library:
Problem
Model I am using: GPT-2, RoBERTa
The problem arises when I ran:
Expected behavior
Since RoBERTa and GPT-2 share vocabulary, are they supposed to have equal
vocab_size
? Not sure if this is a question or bug, so I put it here. If this is intended, may I ask where the difference comes from?