Open arthurv opened 2 months ago
Apologies for the inconvenience. If you’re building from source, you can modify the max_new_tokens parameter in ktransformers/server/backend/args.py. We will include this update in the next Docker release.
I just encountered this limitation. It would be even better if the REST API honored the maximum context length and maximum number of generation tokens.
max_new_tokens = 1000 by default, and this can be specified in ktransformers.local_chat through --max_new_tokens, but not the server.
Please add the --max_new_tokens option to the ktransformers server so we can specify longer output context lengths, and add more generation options (like input context, etc).