Open Elanmarkowitz opened 2 months ago
Thanks for the question! We currently use return_hidden_states
for speculative decoding. You just need to pass it a a config as here. Feel free to mimic the behavior there.
Hi, I also have the same need. I hope to store the hidden_states
during model inference so that I can conduct some interpretability research.
same need, hope we can get this as an option to return embedding.
same need!
Thanks for the question! We currently use
return_hidden_states
for speculative decoding. You just need to pass it a a config as here. Feel free to mimic the behavior there.
Hi, can you further specify, e.g. with demo code?
same need
same need
same need
🚀 The feature, motivation and pitch
I know this feature request sort of already exists: https://github.com/vllm-project/vllm/issues/5950 (and older, semi related requests) https://github.com/vllm-project/vllm/issues/3594 https://github.com/vllm-project/vllm/issues/1857
This is a similar pitch but I am creating a new issue as I noticed newer developments in the codebase. The pitch is to support returning hidden states when generating sequences. This enables many potential behaviors such as output classification, guardrails, etc. Whereas #5950 suggested a different step for embedding, I would suggest building it in as an option to EngineArgs or as an option that can be passed in with each generation request.
I see that in
v0.5.1
there is already some new code inModelDriverBase
to supportreturn_hidden_states
. However, I don't see that supported yet in the LLM engine yet (not an input toEngineArgs
). Basically, it seems like this feature is under development. I am mainly wondering what the timeline is for that? And what is the approach being taken so that I and the community can develop accordingly?Alternatives
No response
Additional context
No response