intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Apache License 2.0
2.11k stars 207 forks source link

Qwen-14B-Chat inference repeat #1144

Closed Storm0921 closed 3 months ago

Storm0921 commented 8 months ago

When i use python_api_example or streaming_llm python scripts to inference Qwen-14B-Chat,the first two questions were outputted normally, but the third question has been repeating itself since then. I find it strange and can stably reproduce this error. And it seems like something has been repeating the prompts all along.

my RAG prompt length=654 image

image

a32543254 commented 8 months ago

Repeat is kind of normal in LLM models. Here is some possible solution:

  1. Try to use do_sample=True in generate api ?
  2. change woq_config args : compute dtype from int8 to bf16
  3. Increase reptition_penalty value
  4. Increase top_k value
Storm0921 commented 8 months ago

Repeat is kind of normal in LLM models. Here is some possible solution:

  1. Try to use do_sample=True in generate api ?
  2. change woq_config args : compute dtype from int8 to bf16
  3. Increase reptition_penalty value
  4. Increase top_k value

I think I did not add duplicate questions caused by Qwen's Prompt Template format. By the way, how should Baichuan's Prompt Template be written?I tried BAICHUAN_PROMPT_FORMAT = "{prompt} ",but failed

Storm0921 commented 8 months ago

Repeat is kind of normal in LLM models. Here is some possible solution:

  1. Try to use do_sample=True in generate api ?
  2. change woq_config args : compute dtype from int8 to bf16
  3. Increase reptition_penalty value
  4. Increase top_k value

Can you help me to sovle this problem?

Zhenzhong1 commented 8 months ago
  1. I think I did not add duplicate questions caused by Qwen's Prompt Template format. By the way, how should Baichuan's Prompt Template be written?I tried BAICHUAN_PROMPT_FORMAT = "{prompt} ",but failed

@fengenbao Hi, Baichuan does not need to add extra prompt templates.


  1. Try to use do_sample=True in generate api ? change woq_config args : compute dtype from int8 to bf16 Increase reptition_penalty value Increase top_k valueCan you help me to sovle this problem?

These are all input args that you can modify.

do_sample = True is a args of the API. For example: outputs = model.generate(inputs, streamer=streamer, max_new_tokens=30, do_sample=True)

Please check this README.md. https://github.com/intel/neural-speed/tree/main

woq_config please check this: https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md#llm-runtime-example-code.

reptition_penalty & top_k value please check https://github.com/intel/neural-speed/blob/main/docs/advanced_usage.md

Storm8878 commented 8 months ago
  1. I think I did not add duplicate questions caused by Qwen's Prompt Template format. By the way, how should Baichuan's Prompt Template be written?I tried BAICHUAN_PROMPT_FORMAT = "{prompt} ",but failed

@fengenbao Hi, Baichuan does not need to add extra prompt templates.

Try to use do_sample=True in generate api ? change woq_config args : compute dtype from int8 to bf16 Increase reptition_penalty value Increase top_k valueCan you help me to sovle this problem?

These are all input args that you can modify.

do_sample = True is a args of the API. For example: outputs = model.generate(inputs, streamer=streamer, max_new_tokens=30, do_sample=True)

Please check this README.md. https://github.com/intel/neural-speed/tree/main

woq_config please check this: https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md#llm-runtime-example-code.

reptition_penalty & top_k value please check https://github.com/intel/neural-speed/blob/main/docs/advanced_usage.md

Thanks for your attention! I classify this question in detail in another issue #1148, please help check if the parameters are set correctly

a32543254 commented 3 months ago

already track with issue https://github.com/intel/intel-extension-for-transformers/issues/1148