Closed edwardzjl closed 1 year ago
The last line of text_generation_launcher: Args seems not correct
What do you mean?
The logprob should always have a value. If it does not something is going wrong, hence the validation error. I will investigate a bit on my side.
We are passing some configuration through command line args, for example --max-input-length 1000 --max-total-tokens 2048
, which is reflected in the $BASE_URL/info
endpoint, but not in the text-generation-launcher --env
command.
After some thinking I think it is the correct behavior as the text-generation-launcher --env
command reflects the environment not the running process. Sorry for my misleading description.
It seems that this happends when using a very low temperature.
For example, when using temperature 10e-3, The server will respond correctly.
> curl -X POST -H "content-type: application/json" -d '{"inputs": "The sky is blue because", "parameters": {"temperature": 0.001, "max_new_tokens":20}}' "http://localhost:8080/generate"
{"generated_text":" the Earth's atmosphere scatters sunlight in all directions. The scattering is caused by the molecules and particles in"}
However, if I change the temperature to 10e-4, the server will respond empty.
> curl -X POST -H "content-type: application/json" -d '{"inputs": "The sky is blue because", "parameters": {"temperature": 0.0001, "max_new_tokens":20}}' "http://localhost:8080/generate"
{"generated_text":""}
If you set a temperature of 10e-4, why not simply use greedy decoding?
I'm new on text generation tasks but I want to lower the "creativity" of the model and stick to stable outputs, and according to my limited knowledge I think should set a low temperature (when using openai I can set the temperature to 0).
But the issue is not about that, it is text-generation-server generates none logprob while the python client of text-generation does not allow that.
I'm new on text generation tasks but I want to lower the "creativity" of the model and stick to stable outputs
You should not use any temperature then and stick to greedy decoding.
But the issue is not about that
The issue seems to be about that. If you push the temperature too low, it will cause float overflow and result in nan logprobs.
Thank you for the advice, I will try greedy encoding.
For this issue, I mean if nan logprobs means something goes wrong, maybe we can return http status 400 or something from the server side, with an error message if possible, instead of 200. What do you think?
System Info
text-generation-inference version: v0.8.2 text-generation version (python client): 0.6.0 gpu: nvidia A100 40G
text-generation-launcher env:
(The last line of
text_generation_launcher: Args
seems not correct, we runtext-generation-inference
in a kubernetes pod and pass args throughcontainers.args
)The real args can be found in the
info
endpoint:Information
Tasks
Reproduction
When the
text-generation-inference
service generate a<unk>
token (which does not have a 'logprob' and I don't know why), Thetext-generation
python client will raise a validation error.The corresponding code lay in
text_generation/types.py
:Expected behavior
This error occurs during deserialization of the
Response
object.As the response code is 200, I suppose the client should not raise an error?
Maybe we should make 'logprob' optional?