Closed cxxszz closed 1 year ago
This is due to the max_tokens flag, which is provided via the config file and passed to the ChatCompletion request. The max_tokens flag determines the maximum number of tokens that the model can generate as part of the completion. However, the number of tokens in the prompt + max_tokens must be <= the context size of the model you are using.
The fix is to either adjust the value of the max_tokens flag or removing it entirely from the ChatCompletion request (then the model will just use the remaining number of tokens in the context as the limit).
Hope this helps! If not, please reopen the issue :)
-- Nils
Dear authors,
When I run the example in Quick Start, I have the following error:
Note that I saved the example as quick_start.py. How should this be fixed?
Thank you for your time and consideration.
Best regards, Weijie Liu