When trying to call encoding_for_model providing a fine-tuned model as input, the following error occurs:
KeyError: 'Could not automatically map davinci:ft-personal:finetunedmodel-2023-05-23-20-00-00 to a tokeniser. Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect.'
Fine-tuned models names always follow this format:
model:ft-personal:name:date
where
model is the base model from which the fine-tuned one has been created
ft-personal is a fixed string that tells that the model is fine-tuned
name is a custom name that the user can give to the new model
date is the date of fine-tuning in the format yyyy-MM-dd-hh-mm-ss
Solutions
Map the models prefixes in MODEL_PREFIX_TO_ENCODING, so that when encoding_for_model calls model_name.startswith, it can also identify all models starting with "davinci", "ada", etc... and, so, identify fine-tuned models.
Issue
When trying to call
encoding_for_model
providing a fine-tuned model as input, the following error occurs:Analysis
See https://platform.openai.com/docs/models/model-endpoint-compatibility See https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
The following models are allowed for fine-tuning:
All of them use the encoding
r50k_base
.Fine-tuned models names always follow this format:
model:ft-personal:name:date
wheremodel
is the base model from which the fine-tuned one has been createdft-personal
is a fixed string that tells that the model is fine-tunedname
is a custom name that the user can give to the new modeldate
is the date of fine-tuning in the formatyyyy-MM-dd-hh-mm-ss
Solutions
Map the models prefixes in
MODEL_PREFIX_TO_ENCODING
, so that when encoding_for_model callsmodel_name.startswith
, it can also identify all models starting with "davinci", "ada", etc... and, so, identify fine-tuned models.