sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.1k stars 356 forks source link

[Feature] Use Embedding/Generation Model to get its Generation/Emebedding #1200

Closed zhaochenyang20 closed 2 weeks ago

zhaochenyang20 commented 2 weeks ago

Checklist

Motivation

Currently, SGLang supports getting generation content (chat completion) from generative models and embedding from embedding models. But theoretically, we can get embedding/generation from both embedding/generation models.

Something should be stressed that even we can do this, it's not usefully in practice.

The key differences between generation and embedding models primarily stem from their post-training specialization, leading to a loss of some capabilities, akin to catastrophic forgetting. Embedding models focus on compressing information into a fixed-dimensional vector space, discouraging long-term predictions, while generation models aim to reduce uncertainty in the probability space, addressing both compression of current information and future uncertainties.

The user draws a parallel between these tasks and the distinction between non-autoregressive and autoregressive models. They suggest that embedding models should be decoded with methods like MCMC rather than token-by-token approaches.

The community tends to treat generation and embedding as separate tasks, each with its own specialized models and research focus. While the idea of a model that can handle both tasks is attractive, practical challenges make it difficult to implement. The user also notes that OpenAI’s recommendation to fine-tune models for specific applications feels overly product-oriented and not aligned with the concept of AGI.

Related resources

https://github.com/sgl-project/sglang/pull/1186

merrymercy commented 2 weeks ago

closed by https://github.com/sgl-project/sglang/pull/1186