[Tracking] Make SeperateEmbedding as a Default in the SLM pipeline

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

https://llm.mlc.ai/

Apache License 2.0

18.8k stars 1.53k forks source link

[Tracking] Make SeperateEmbedding as a Default in the SLM pipeline #1473

Closed tqchen closed 4 months ago

tqchen commented 9 months ago

Overview

Up until now the SLM model's prefill, decode and other items includes the embedding lookup as part of the whole pipeline. In the legacy flow, we increasingly see the need to separate embedding lookup so the overall flow can be enabled for cases like multi-modal usecases.

This is a tracking issue to make sure we completely migrate to separate embedding in the SLM pipeline and deprecate the usages that do not do seperate embedding.

Action Items

[ ] Update the current models to separate the embedding function
[ ] Update prefill and decode to take the result of embedding input instead of the tokens
[ ] Provide warning in the models build without separate embedding, and eventually deprecate the case

Links to Related Issues and PRs

dusty-nv commented 7 months ago

Hi @tqchen, is sep_embed available/default yet in mlc_chat compile? or still only mlc_llm.build

tqchen commented 7 months ago

cc @MasterJH5574 @jinhongyii who is looking into this right now

dusty-nv commented 7 months ago

@MasterJH5574 @jinhongyii any hypothetical timeline for this or WIP branch that I could checkout?

MasterJH5574 commented 7 months ago

Hi @dusty-nv, I think this feature should be available for mlc_chat compile on main already since #1724. Right now it only works for Llama though. We will expand the support to more models recently.

The command of compiling models and running models shouldn't have any change if everything goes as expectation. Please let us know if you run into any issue! Thanks a lot :-)

dusty-nv commented 7 months ago

Thanks @MasterJH5574, I am running a newer build than that, but am unable to retrieve the prefill_with_embed function from model libraries compiled with mlc_chat compile as opposed to mlc_llm.build. As per https://github.com/mlc-ai/mlc-llm/pull/1724, should I now just be using the prefill function and its signature has been changed to accept embeddings instead of tokens?

We will expand the support to more models recently.

I need this for StableLM and Phi-2 also, thank you 🙏

MasterJH5574 commented 7 months ago

As per https://github.com/mlc-ai/mlc-llm/pull/1724, should I now just be using the prefill function and its signature has been changed to accept embeddings instead of tokens?

@dusty-nv Yes I think so! prefill_with_embed will not be introduced for models defined under python/mlc_chat.

I need this for StableLM and Phi-2 also, thank you

Thank you! We are actively working on that. Here's the related tracking issue https://github.com/mlc-ai/mlc-llm/issues/1749