huggingface / tgi-gaudi

Large Language Model Text Generation Inference on Habana Gaudi
http://hf.co/docs/text-generation-inference
Apache License 2.0
26 stars 46 forks source link

Incorrect answer with openai compatible penalty parameters #238

Open Spycsh opened 3 weeks ago

Spycsh commented 3 weeks ago

System Info

Hi there, I met a bug that when using TGI Gaudi 2.0.5 with both meta-llama/Meta-Llama-3-8B-Instruct and Intel/neural-chat-7b-v3-3. When I set the default frequency/repetition/presence penalty parameters based on the openai format(https://platform.openai.com/docs/api-reference/completions/create), I got wrong answers. Here are the screenshots:

image1

image2

I then checked it on TGI CPU and I did not encounter the bug, so I suspect there is something wrong with TGI Gaudi. Could you please look at this issues?

Information

Tasks

Reproduction

Here is a minimum reproduction

model=Intel/neural-chat-7b-v3-3
hf_token=xxxx
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all \
 -e PT_HPU_LAZY_MODE=0 -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
 -e HF_TOKEN=$hf_token --cap-add=sys_nice --ipc=host -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} \
 ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id $model --max-input-tokens 1024 --max-total-tokens 2048
http_proxy= curl http://${host_ip}:8081/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "tgi",
    "messages": [
      {
        "role": "user",
        "content": "What is deep Learning!"
      }
    ], "max_tokens":128,"temperature":0.01, "top_p":0.95, "frequency_penalty":0.0, "repetition_penalty":1.03, "presence_penalty":0.0 }'

The answer (missing spaces between words in the end):

{"id":"","object":"text_completion","created":1729153043,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.0.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning refers to a subset of machinelearning techniques that use artificial neural networks (ANNs) with multiple layers for feature extraction and transformation. These algorithms are designed based on the structure, functioningsimilarityto human brain's neuronsand their connections in order tomimethe processof how humans learn from data by recognizing patterns within it without explicit programming rules or instructions being given beforehand; this makes them highly effective at handling complex tasks like image recognitionor natural language processing(NLP). The deeper these network structures get - meaning more hiddenlayers-themorecomplexpatternsthatcanbelearnedfromdataarepossible"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":128,"total_tokens":133}}

Then I remove the repetition_penalty, only keep openai compatible frequency_penalty, presence_penalty

http_proxy= curl http://${host_ip}:8081/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "tgi",
    "messages": [
      {
        "role": "user",
        "content": "What is deep Learning!"
      }
    ], "max_tokens":128,"temperature":0.01, "top_p":0.95, "frequency_penalty":0.0, "presence_penalty":0.0 }'

Still error:

{"id":"","object":"text_completion","created":1729153206,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.0.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning refers to a subset of machinelearning techniques that use artificial neural networks (ANNs) with multiple layers for feature extraction and transformation. These algorithms are designed based on the structure, functioningsimilarityto human brain's neuronsand their connections in order tomimethe processof how humans learn from data or information through experience by recognizing patterns within large datasets without explicit programming rules defined beforehand; this allows them tounderstandcomplexrelationshipsbetween variables more effectively than traditionalmachine-learninglearningalgorithmswhich relyonlinearmodelsorrulebasedapproachesforpatternrecognition tasks such as image classification"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":128,"total_tokens":133}}

Expected behavior

The answer should be well-formatted and correct.

yuanwu2017 commented 1 day ago

In fact our tgi version is 2.0.4, so the inference has big gap with main tree. The upgrade patch is under review. Please have a try. https://github.com/huggingface/tgi-gaudi/pull/225