Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.62k
stars
1.26k
forks
source link
Nano: Hard to apply inference optimizations with accuracy control on YOLOX with little code modification #5995
Description
YOLOX evaluate the model with COCOAPI. The evaluation-related code like this:
It's hard to convert the code segment to a function like:
I need to subclass InferenceOptimizer to apply inference acceleration with accuracy control on YOLOX for now.
Is there a simple way to obtain inference acceleration with accuracy(AP/AR) control?