intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.75k stars 1.27k forks source link

unexpected output when inference Qwen-7B-Chat-10-12 with 1024-128 in_out_pairs using transformer_int4_gpu api #9351

Open WeiguangHan opened 1 year ago

WeiguangHan commented 1 year ago

@hkvision The output was about 41 which didn't meet expectations when I tested the Qwen-7B-Chat-10-1t model with 1024-128 in_out_pairs using transformer_int4_gpu api. Please have a look.

Env

bigdl-core-xe-2.4.0b20231102 bigdl-llm-2.4.0b20231102 intel-extension-for-pytorch-2.0.110+xpu torch-2.0.1a0+cxx11.abi torchvision-0.15.2a0+cxx11.abi

hkvision commented 1 year ago

@qiuxin2012 Is it also due to the input prompt?

qiuxin2012 commented 1 year ago

@qiuxin2012 Is it also due to the input prompt?

Maybe, you can try some new input prompts.