intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.49k stars 1.24k forks source link

whisper tiny和base的int4 wer精度问题 #9515

Open Fred-cell opened 9 months ago

Fred-cell commented 9 months ago

current issue is the precision for whisper with INT4. I have synced this with Kai.

hkvision commented 9 months ago

As we talked offline, int4 may not be applicable for small models like whisper tiny and base. Int5 (sym or asym) might be a good alternative considering the wtf and wer. You can check if int5 can meet your demands.

hkvision commented 9 months ago

Fred will test int5/fp8 and give feedback.