vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.76k stars 3.92k forks source link

[Feature] Want to get the `last_hidden_states`, is there an interface for that? If not, what code should be modified to realize it? #853

Open tongyx361 opened 1 year ago

tongyx361 commented 1 year ago

I looked into the source code and found that the class Sampler discards the prefix in last_hidden_states.

class Sampler(nn.Module):
    """Samples the next tokens from the model's outputs.

    This layer does the following:
    1. Discard the hidden states that are not used for sampling (i.e., all
        tokens except the final one in each prompt).
    2. Compute the logits for the next tokens.
    3. Apply presence and frequency penalties.
    4. Apply temperature scaling.
    5. Apply top-p and top-k truncation.
    6. Sample the next tokens.
    Here, each sequence group within the batch can have different sampling
    parameters (e.g., sampling method, temperature, top-p, top-k, etc.).
    """

Is it possible for me to start with Sampler and implement the output to last_hidden_states as an optional output? Could the development team or anyone else familiar with vLLM provide some guidance and suggestions?


想要获取 last_hidden_states,有无对应接口?如果没有,应该修改哪些代码来实现?

我仔细查看了源代码,发现类 Sampler 舍弃了 last_hidden_states 中的前缀。

class Sampler(nn.Module):
    """Samples the next tokens from the model's outputs.

    This layer does the following:
    1. Discard the hidden states that are not used for sampling (i.e., all
        tokens except the final one in each prompt).
    2. Compute the logits for the next tokens.
    3. Apply presence and frequency penalties.
    4. Apply temperature scaling.
    5. Apply top-p and top-k truncation.
    6. Sample the next tokens.
    Here, each sequence group within the batch can have different sampling
    parameters (e.g., sampling method, temperature, top-p, top-k, etc.).
    """

我是否可以从 Sampler 开始修改,以可选输出的形式实现对 last_hidden_states 的输出? 请问开发团队或其他任何熟悉 vLLM 的人能否提供一些指导和建议?

WuNein commented 10 months ago

我也很需要这个API,来提取embedding,forward处开始改的话太痛苦了,

hmellor commented 6 months ago

@WoosukKwon @simon-mo @zhuohan123 is this a feature that you'd like to see implemented?

WuNein commented 6 months ago

@WoosukKwon @simon-mo @zhuohan123 is this a feature that you'd like to see implemented?

https://github.com/WuNein/vllm4mteb/blob/main/vllm-new.py I have a demo, using existing vllm api.

Opdoop commented 3 months ago

@WuNein It looks great! Will you create a PR to the main vllm so that we may using vllm to serving embedding model?

Opdoop commented 3 months ago

Its more useful if we can support decode-based embedding model with v1/embedding api like openai embedding api.

WuNein commented 3 months ago

Its more useful if we can support decode-based embedding model with v1/embedding api like openai embedding api.

How would I say, someone do something [Model][Misc] Add e5-mistral-7b-instruct and Embedding API #3734 But i don't think it make sense.