Open stuartleeks opened 3 months ago
I think it would be good to add configuration to control the number of tokens in generated completion responses. E.g. mean % of max_tokens or similar.
Should this be a single config value for all endpoints or config per deployment in the deployment JSON file?
Currently the chat completion generator uses 250 tokens for the generated response.
Is this a reasonable size? Should it be configurable, does it need to vary?