Suggestions for (more detailed) LLM inference classification

Monstertail commented 6 months ago

I notice the classification LLM inference is kind of coarse-grained. Therefore I open this issue to keep updating suggestions.

Monstertail commented 6 months ago

First of all, in the "Decoding Algorithms" branch of Inference, most of the decoding algorithms target for the efficiency of certain scenarios.

Based on this fact, we can have these three sub-branches for "Decoding Algorithms" according to the scenarios: 1) Long context scenarios. e.g Streaming LLM, Infinite LLM; 2) structured interaction scenarios. e.g. my latest work DeFT is for tree-based decoding efficiency, so does SGLang from Berkeley with efficient memory management of tree-based decoding. It could be expanded to graphs in multi-agent scenarios. 3) Non-autoregressive/Parallel/Speculative decoding. To generate more than one tokens in each decoding step. Dozens of papers.

Jingyu6 commented 6 months ago

Hi @Monstertail,

Thanks for your suggestion on the sub-branches for "decoding algorithm". In fact the paper covers most if not all of your suggested works although not all of them are listed here due to space constraint (and clarity). But I do think your suggestion on the classification makes sense, which I plan to incorporate into the next version. Thanks for the nice feedback.

Best, Jingyu

ulab-uiuc / AGI-survey

Suggestions for (more detailed) LLM inference classification #3