huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.17k stars 27.05k forks source link

T5 Base length of Tokenizer not equal config vocab_size #10144

Closed ari9dam closed 3 years ago

ari9dam commented 3 years ago

Environment info

Issue

The len(AutoTokenizer.from_pretrained("t5-base")) is 32100 but the T5ForConditionalGeneration.from_pretrained("t5-base").config.vocab_size is 32128. Seems to be a similar issue to that of : https://github.com/huggingface/transformers/issues/2020

patrickvonplaten commented 3 years ago

duplicate of https://github.com/huggingface/transformers/issues/4875 I think

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.