vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.47k stars 4.42k forks source link

[Feature]: option to return hidden states #4435

Open zhenlan0426 opened 6 months ago

zhenlan0426 commented 6 months ago

🚀 The feature, motivation and pitch

I am generating mutiple samples from the same prompt as in self-consistent Chain of Thought (CoT). I have trained a separate evaluation head (using the same backbone as the LLM generator) to assess the quality of each sample. Without the option to return hidden states, I would need to perform an additional forward pass to obtain them. The majority of the computational work is already done when the VLLM generates the samples. Having this option would significantly save on inference time. If I were to implement this, could someone point me in the right direction regarding which parts of the source code I should look at? Any pointers would be appreciated. Thanks!

Alternatives

No response

Additional context

No response

zhenlan0426 commented 6 months ago

I can work on this if someone can point me in the right direction and let me know what part of the code base I should look at.

lauhaide commented 4 months ago

Hi @zhenlan0426, have you made any progress on this? I would also need the hidden_states returned as part of the output of the generate() method.

It seems that hidden states are stored in the inference code: https://github.com/vllm-project/vllm/blob/c2462129521a64b62ace77b28641d2e3bec5831c/vllm/worker/model_runner.py#L774C17-L774C37