How to pass `max_token_length` to `load_model` ?

hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.

MIT License

1.29k stars 80 forks source link

How to pass `max_token_length` to `load_model` ? #116

Closed MohamedAliRashad closed 1 year ago

MohamedAliRashad commented 1 year ago

I want the model to keep outputing text for the max_token_length i specify.

peakji commented 1 year ago

Hi @MohamedAliRashad !

If you want to limit the maximum allowed number of output tokens, use the COMPLETION_MAX_TOKENS environment variable (default is 4096).

If you want the model to always output text up to a certain length, use the min_tokens request parameter.