onnx-inference lantency slows down

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Apache License 2.0

6.55k stars 1.25k forks source link

onnx-inference lantency slows down #3934

Open liangs6212 opened 2 years ago

liangs6212 commented 2 years ago

Currently onnx-inference(tcn) delay is about 1.5ms, a month ago, this number was about 1.04ms, don't know why.

server: cpx-3 onnxruntime version: 1.10.0 bigdl-nano==0.14.0b20220118

TheaperDeng commented 2 years ago

issue noticed, please try 0.14.0b20211208 and confirm this is a chronos problem rather than an environment/platform problem.