vgel / repeng

A library for making RepE control vectors
https://vgel.me/posts/representation-engineering/
MIT License
461 stars 37 forks source link

Correct the positive and negative persona #22

Closed hahuyhoang411 closed 4 months ago

hahuyhoang411 commented 6 months ago

In the make_dataset, the positive and negative persona looks like it's reverse.

vgel commented 6 months ago

Thanks for this--I'll need to take a deeper look at the tests, since I'm not sure why they're passing with the flipped vector (it's possible I reduced the training examples too much in my attempt to speed up the testsuite, and the vector isn't being trained right...)

vgel commented 4 months ago

Fixed by #34