Closed JoFrost closed 10 months ago
It seems that you are using temperature = 0.8
so a sampling strategy.
Can you make sure that for a given seed that works, the model always outputs an answer and the other way around?
If that's the case, TGI is working properly and it is only because of the sampling strategy that you do not get an answer.
I'm getting the same behavior although i've run it with 2 different models. I've run with specifying a variety of temperatures, as well as leaving it unspecified. I'm using version 1.1.0 of the TGI and using the TGI client to make requests. If i provide input of only a single sentence, then I tend to reliably get results, but this is bizarre.
Models tested: mistralai/Mistral-7B-Instruct-v0.1, togethercomputer/Llama-2-7B-32K-Instruct
Probably not directly related, but I recently experience TGI returning empty generated_text
while the "tokens"
field inside the response object are not empty. For those who are seeing empty "generated_text" returned, consider checking if you can find anything in the "tokens"
field.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
System Info
Information
Tasks
Reproduction
Send this context on the
generate_stream
endpoint, using the following parameters:Sometimes, the response will be the one expected, sometimes, the response will be empty. Here's what shows up when the response is empty, on server-side :
The inference time is abnormally low, compared to when the inference is successfully done:
Expected behavior
The server should return an answer all the time, and not randomly send empty answers.