[Feature]: Control vectors

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

30.25k stars 4.58k forks source link

[Feature]: Control vectors #3451

Open generalsvr opened 8 months ago

generalsvr commented 8 months ago

🚀 The feature, motivation and pitch

Add support for control vectors

See https://github.com/vgel/repeng and https://github.com/ggerganov/llama.cpp/pull/5970

Alternatives

No response

Additional context

No response

justinphan3110 commented 7 months ago

@simon-mo @generalsvr I should be able to help with this. Let me know how to start.

For more context about control vectors: Representation Engineering: A Top-Down Approach to AI Transparency

Kaiyang-Chen commented 7 months ago

We can achieve this by loading the control vectors when initializing the cache engine and apply the change to forward() of specified QKVLinear layers, but such changes will be added for all models and all kinds of linear method, which introduce extra complexity to the codebase. Do you have any hints on how we can abstract such logic and make the integration clear? @simon-mo

sapountzis commented 6 months ago

Something additional to consider is specifying different control vectors (and coefficients) per request which then get stacked into a control matrix with one dimension equal to the batch size.

This can be useful when serving users that require different styles of responses at the same time.

Not sure about the impact on latency.

raywanb commented 6 months ago

currently working on an implementation by wrapping the decoder layer and changing the forward pass. lmk if you wanna collaborate on this

DreamGenX commented 6 months ago

@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards:

https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

It comes with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1

There's a discussion in the comments with the authors of the Represenation Engineering paper.

heraclex12 commented 6 months ago

@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards:

https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

It cames with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1

There's a discussion in the comments with the authors of the Represenation Engineering paper.

It seems that the colab link doesn't work.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!