Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
4.37k
stars
385
forks
source link
[TorchAcc] Update padding strategy when using persistent cache #2464
Closed
eedalong closed 1 week ago
PR type
PR information
Optimize padding strategy when persistent cache is enabled, so we can enjoy performance boost with little extra compilaiton.
Experiment results
Around 10% e2e performance improvement for TorchAcc backend.