kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
741 stars 39 forks source link

Specify MAX_NEW_TOKENS for ktransformers server #92

Open arthurv opened 2 months ago

arthurv commented 2 months ago

max_new_tokens = 1000 by default, and this can be specified in ktransformers.local_chat through --max_new_tokens, but not the server.

Please add the --max_new_tokens option to the ktransformers server so we can specify longer output context lengths, and add more generation options (like input context, etc).

Azure-Tang commented 2 months ago

Apologies for the inconvenience. If you’re building from source, you can modify the max_new_tokens parameter in ktransformers/server/backend/args.py. We will include this update in the next Docker release.

bitbottrap commented 1 month ago

I just encountered this limitation. It would be even better if the REST API honored the maximum context length and maximum number of generation tokens.