Incorrect answer with openai compatible penalty parameters

System Info

Hi there, I met a bug that when using TGI Gaudi 2.0.5 with both meta-llama/Meta-Llama-3-8B-Instruct and Intel/neural-chat-7b-v3-3. When I set the default frequency/repetition/presence penalty parameters based on the openai format(https://platform.openai.com/docs/api-reference/completions/create), I got wrong answers. Here are the screenshots:

I then checked it on TGI CPU and I did not encounter the bug, so I suspect there is something wrong with TGI Gaudi. Could you please look at this issues?

Information

[X] Docker
[ ] The CLI directly

Tasks

[x] An officially supported command
[ ] My own modifications

Reproduction

Here is a minimum reproduction

model=Intel/neural-chat-7b-v3-3
hf_token=xxxx
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all \
 -e PT_HPU_LAZY_MODE=0 -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
 -e HF_TOKEN=$hf_token --cap-add=sys_nice --ipc=host -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} \
 ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id $model --max-input-tokens 1024 --max-total-tokens 2048

http_proxy= curl http://${host_ip}:8081/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "tgi",
    "messages": [
      {
        "role": "user",
        "content": "What is deep Learning!"
      }
    ], "max_tokens":128,"temperature":0.01, "top_p":0.95, "frequency_penalty":0.0, "repetition_penalty":1.03, "presence_penalty":0.0 }'

The answer (missing spaces between words in the end):

{"id":"","object":"text_completion","created":1729153043,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.0.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning refers to a subset of machinelearning techniques that use artificial neural networks (ANNs) with multiple layers for feature extraction and transformation. These algorithms are designed based on the structure, functioningsimilarityto human brain's neuronsand their connections in order tomimethe processof how humans learn from data by recognizing patterns within it without explicit programming rules or instructions being given beforehand; this makes them highly effective at handling complex tasks like image recognitionor natural language processing(NLP). The deeper these network structures get - meaning more hiddenlayers-themorecomplexpatternsthatcanbelearnedfromdataarepossible"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":128,"total_tokens":133}}

Then I remove the repetition_penalty, only keep openai compatible frequency_penalty, presence_penalty

http_proxy= curl http://${host_ip}:8081/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "tgi",
    "messages": [
      {
        "role": "user",
        "content": "What is deep Learning!"
      }
    ], "max_tokens":128,"temperature":0.01, "top_p":0.95, "frequency_penalty":0.0, "presence_penalty":0.0 }'

Still error:

{"id":"","object":"text_completion","created":1729153206,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.0.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning refers to a subset of machinelearning techniques that use artificial neural networks (ANNs) with multiple layers for feature extraction and transformation. These algorithms are designed based on the structure, functioningsimilarityto human brain's neuronsand their connections in order tomimethe processof how humans learn from data or information through experience by recognizing patterns within large datasets without explicit programming rules defined beforehand; this allows them tounderstandcomplexrelationshipsbetween variables more effectively than traditionalmachine-learninglearningalgorithmswhich relyonlinearmodelsorrulebasedapproachesforpatternrecognition tasks such as image classification"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":128,"total_tokens":133}}

Expected behavior

The answer should be well-formatted and correct.

huggingface / tgi-gaudi