Open shreyansh26 opened 4 months ago
Okay so it looks like using ghcr.io/predibase/lorax:latest
fixes it. Probably an issue with the current latest main branch.
I'm facing a similar problem.
When using image ghcr.io/predibase/lorax:main
, I see garbage outputs.
Older version of lorax or hf tgi is not creating such issue.
curl -X 'POST' \
'http://127.0.0.1:50710/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}],
"model": "",
"seed": 42,
"max_tokens": 256, "temperature": 0.1
}'
{
"id":"null",
"object":"text_completion",
"created":0,
"model":"null",
"choices":[
{"index":0,"message":
{
"role":"assistant",
"content":"I \n ____ \n\n코_김 \n코 a a \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n1\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n1 and the and the and the and the a and the the a a and the a a a a a a a a a a a a a a a a a a a a the the a and 2 and the a a a the the the a the the the the the a a 2 and I a\n\n\n1\n\n\n\n\n\n\n\n\n\n\n\n\n\ns and /******/ and the and the and the and the and the the the the the the the the a and /******/ and the the the the the a and /******/ and /******/ 1 and the, a a a a and the a.\n\n\n /******/ and /******/ and /******/ and /******/ and /******/ and /******/ and the a a.jpg the the the a the the and a and /******/ and the a a the the the a a the a a a a a to the a a a a. /******/ a"
},
"finish_reason":"length"
}
],
"usage":{
"prompt_tokens":25,
"total_tokens":281,
"completion_tokens":256
}
}
System Info
Using Docker server
Running on a node with 8xH100 80GB GPUs. Here device 3 is completely empty and has no process running.
Information
Tasks
Reproduction
Launch Lorax server
Use lorax-client with Python to query the server.
This generates garbage output
And on the server side -
And this is not input related. Garbage values are generated with pretty much every prompt I tried.
Expected behavior
Using a simple HF inference script gives the expected output.
Output