Closed daehuikim closed 1 month ago
The T5Tokenizer
is unrelated to tokenizers
. cc @itazap if you are able to reproduce with "t5-base"
Hi @daehuikim ! Regarding your code-snippet not sure if it was only for the purpose of the code-snippet, but is your first model
variable referring to the path? and then being overwritten by the model
object itself?
from_pretrained
expects:
Args:
pretrained_model_name_or_path (`str` or `os.PathLike`):
Can be either:
- A string, the *model id* of a predefined tokenizer hosted inside a model repo on huggingface.co.
- A path to a *directory* containing vocabulary files required by the tokenizer, for instance saved
using the [`~tokenization_utils_base.PreTrainedTokenizerBase.save_pretrained`] method, e.g.,
`./my_model_directory/`.
- (**Deprecated**, not applicable to all derived classes) A path or url to a single saved vocabulary
file (if and only if the tokenizer only requires a single vocabulary file like Bert or XLNet), e.g.,
`./my_model_directory/vocab.txt`.
Maybe I misunderstood your question! Please let me know! 😊
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi, I found interesting bug(maybe I could be wrong) that is in from_pretrained. below are the code that i produce my bug.
The model directory contains fine tuned T5 tensors and other necessary files with training results. Specific tree is like below
Whenever I try the code above, I can get errors like below
However, after moving files that is related to tokenizers, and fix some code, I can get no errors. Below are fixed code and changed repo tree
in tokenizer_path
Therefore, I Guess
tokenizer.from_pretrained()
method is readingconfig.json
other thantokenizer_config.json
. If I am right, can you fix this feature in the following release? (It seems If there exist "confing.json" and "tokenizer_config.json" at the same time, "config.json" wins at all) Thanks for reading my issue!