Classification of LLM training and inference efficiency

ulab-uiuc / AGI-survey

MIT License

321 stars 20 forks source link

classification

Hi Jingyu, I wonder whether there is something not so accurate in this classification. There are many papers about KV cache optimizations/compression on the inference side(at least 5 new papers each month) rather than training; Also, there are many papers about memory management in the LLM serving systems (e.g. vLLM, SGLang).

I feel like the training is calculation-bound while the inference is memory-bound. Although there are some papers like Galore to make memory-efficient pretraining/fine-tuning, I still think memory management is more important(at least the same importance) on the inference side.

What do you think?

@Jingyu6

ulab-uiuc / AGI-survey

Classification of LLM training and inference efficiency #2