vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.53k stars 4.05k forks source link

[Feature]: Return hidden states (in progress?) #6165

Open Elanmarkowitz opened 2 months ago

Elanmarkowitz commented 2 months ago

🚀 The feature, motivation and pitch

I know this feature request sort of already exists: https://github.com/vllm-project/vllm/issues/5950 (and older, semi related requests) https://github.com/vllm-project/vllm/issues/3594 https://github.com/vllm-project/vllm/issues/1857

This is a similar pitch but I am creating a new issue as I noticed newer developments in the codebase. The pitch is to support returning hidden states when generating sequences. This enables many potential behaviors such as output classification, guardrails, etc. Whereas #5950 suggested a different step for embedding, I would suggest building it in as an option to EngineArgs or as an option that can be passed in with each generation request.

I see that in v0.5.1 there is already some new code in ModelDriverBase to support return_hidden_states. However, I don't see that supported yet in the LLM engine yet (not an input to EngineArgs). Basically, it seems like this feature is under development. I am mainly wondering what the timeline is for that? And what is the approach being taken so that I and the community can develop accordingly?

Alternatives

No response

Additional context

No response

LiuXiaoxuanPKU commented 2 months ago

Thanks for the question! We currently use return_hidden_states for speculative decoding. You just need to pass it a a config as here. Feel free to mimic the behavior there.

Hambaobao commented 2 months ago

Hi, I also have the same need. I hope to store the hidden_states during model inference so that I can conduct some interpretability research.

PeterAdam2015 commented 2 months ago

same need, hope we can get this as an option to return embedding.

ummagumm-a commented 2 months ago

same need!

freesunshine0316 commented 1 month ago

Thanks for the question! We currently use return_hidden_states for speculative decoding. You just need to pass it a a config as here. Feel free to mimic the behavior there.

Hi, can you further specify, e.g. with demo code?

J0hnArren commented 1 month ago

same need

Gxy-2001 commented 1 month ago

same need

zkwhandan commented 4 days ago

same need