Closed raihan0824 closed 1 year ago
how are you using it? There is a parameter you can set in the Makefile, which is set to 100 by default https://github.com/huggingface/transformers-bloom-inference/blob/7bea3526d8270b4aeeefecc57d7d7d638e2bbe0e/inference_server/server.py#L36
well noted!
closing this
Why the model always generating less than 100 tokens even though I instruct it to use more than 100?