Closed velocityCavalry closed 7 months ago
It's expected, but we can / should fix it. I'll see what I can do because it's not being accessed!
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
(It was fixed by disabling verbose in the PretrainedTokenizerBase) class
Hi, I trained a tokenizer from scratch for raw wikitext103 using the code:
and it was saved to tokenizer.json. However, when I was trying to follow the tutorial to load the tokenizer by doing
It gives me errors saying
Using sep_token, but it is not set yet. Using cls_token, but it is not set yet. Using mask_token, but it is not set yet.
However, when I was training the tokenizer or doing the postprocessing, no cls or sep or mask token was involved.I wonder whether this is a feature, or something is wrong with my code. I wonder whether anyone has encountered similar problems before?
However, by eyeballing the encoded and decoded results, it looks fine,
Thank you so much and appreciate for any help!