Hi there, I met a bug that when using TGI Gaudi 2.0.5 with both meta-llama/Meta-Llama-3-8B-Instruct and Intel/neural-chat-7b-v3-3. When I set the default frequency/repetition/presence penalty parameters based on the openai format(https://platform.openai.com/docs/api-reference/completions/create), I got wrong answers. Here are the screenshots:
I then checked it on TGI CPU and I did not encounter the bug, so I suspect there is something wrong with TGI Gaudi. Could you please look at this issues?
Information
[X] Docker
[ ] The CLI directly
Tasks
[x] An officially supported command
[ ] My own modifications
Reproduction
Here is a minimum reproduction
model=Intel/neural-chat-7b-v3-3
hf_token=xxxx
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all \
-e PT_HPU_LAZY_MODE=0 -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
-e HF_TOKEN=$hf_token --cap-add=sys_nice --ipc=host -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} \
ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id $model --max-input-tokens 1024 --max-total-tokens 2048
The answer (missing spaces between words in the end):
{"id":"","object":"text_completion","created":1729153043,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.0.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning refers to a subset of machinelearning techniques that use artificial neural networks (ANNs) with multiple layers for feature extraction and transformation. These algorithms are designed based on the structure, functioningsimilarityto human brain's neuronsand their connections in order tomimethe processof how humans learn from data by recognizing patterns within it without explicit programming rules or instructions being given beforehand; this makes them highly effective at handling complex tasks like image recognitionor natural language processing(NLP). The deeper these network structures get - meaning more hiddenlayers-themorecomplexpatternsthatcanbelearnedfromdataarepossible"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":128,"total_tokens":133}}
Then I remove the repetition_penalty, only keep openai compatible frequency_penalty, presence_penalty
{"id":"","object":"text_completion","created":1729153206,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.0.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning refers to a subset of machinelearning techniques that use artificial neural networks (ANNs) with multiple layers for feature extraction and transformation. These algorithms are designed based on the structure, functioningsimilarityto human brain's neuronsand their connections in order tomimethe processof how humans learn from data or information through experience by recognizing patterns within large datasets without explicit programming rules defined beforehand; this allows them tounderstandcomplexrelationshipsbetween variables more effectively than traditionalmachine-learninglearningalgorithmswhich relyonlinearmodelsorrulebasedapproachesforpatternrecognition tasks such as image classification"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":128,"total_tokens":133}}
System Info
Hi there, I met a bug that when using TGI Gaudi 2.0.5 with both meta-llama/Meta-Llama-3-8B-Instruct and Intel/neural-chat-7b-v3-3. When I set the default frequency/repetition/presence penalty parameters based on the openai format(https://platform.openai.com/docs/api-reference/completions/create), I got wrong answers. Here are the screenshots:
I then checked it on TGI CPU and I did not encounter the bug, so I suspect there is something wrong with TGI Gaudi. Could you please look at this issues?
Information
Tasks
Reproduction
Here is a minimum reproduction
The answer (missing spaces between words in the end):
Then I remove the
repetition_penalty
, only keep openai compatible frequency_penalty, presence_penaltyStill error:
Expected behavior
The answer should be well-formatted and correct.