Open generalsvr opened 8 months ago
@simon-mo @generalsvr I should be able to help with this. Let me know how to start.
For more context about control vectors: Representation Engineering: A Top-Down Approach to AI Transparency
We can achieve this by loading the control vectors when initializing the cache engine and apply the change to forward()
of specified QKVLinear
layers, but such changes will be added for all models and all kinds of linear method, which introduce extra complexity to the codebase. Do you have any hints on how we can abstract such logic and make the integration clear? @simon-mo
Something additional to consider is specifying different control vectors (and coefficients) per request which then get stacked into a control matrix with one dimension equal to the batch size.
This can be useful when serving users that require different styles of responses at the same time.
Not sure about the impact on latency.
currently working on an implementation by wrapping the decoder layer and changing the forward pass. lmk if you wanna collaborate on this
@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards:
https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction
It comes with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1
There's a discussion in the comments with the authors of the Represenation Engineering paper.
@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards:
https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction
It cames with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1
There's a discussion in the comments with the authors of the Represenation Engineering paper.
It seems that the colab link doesn't work.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
🚀 The feature, motivation and pitch
Add support for control vectors
See https://github.com/vgel/repeng and https://github.com/ggerganov/llama.cpp/pull/5970
Alternatives
No response
Additional context
No response