Closed mayankjobanputra closed 1 year ago
In case anyone is wondering the same, you will have to change the tokenizer in llmfoundry/utils/builders.py
and other changes as mentioned above.
tokenizer = Tokenizer.from_file(tokenizer_name)
tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
❓ Question
I want to train a custom tokenizer (just like GPTNeoX tokenizer) using the same script GPTNeoX provides. My questions are as follows:
tokenizer_name
in YAML?vocab_size
undermodel
config?P.S. Thanks for the amazing repository.