It appears that the current code-path tries to find a valid Tokenizer for a given model name, and that this Tokenizer is only used to check whether the user's input is shorter than the context window.
This causes the framework to not work for models that do not have a public Tokenizer.
Most evident is the example of "gpt4". However, more relevant, this prevents users from trying their own custom, private, models that are hosted behind some unknown OpenAI-compatible server somewhere that has not published any information about their models, such as their associated tokenizers.
A better approach would be to simply prompt the OpenAI server with the full input, however long it may be, and then listen for an error. OpenAI-compatible servers typically respond with errors along the lines of "your input exceeds the context window by X tokens, please reformat it and try again".
It appears that the current code-path tries to find a valid Tokenizer for a given model name, and that this Tokenizer is only used to check whether the user's input is shorter than the context window.
This causes the framework to not work for models that do not have a public Tokenizer.
Most evident is the example of "gpt4". However, more relevant, this prevents users from trying their own custom, private, models that are hosted behind some unknown OpenAI-compatible server somewhere that has not published any information about their models, such as their associated tokenizers.
A better approach would be to simply prompt the OpenAI server with the full input, however long it may be, and then listen for an error. OpenAI-compatible servers typically respond with errors along the lines of "your input exceeds the context window by X tokens, please reformat it and try again".