vgel / repeng

A library for making RepE control vectors
https://vgel.me/posts/representation-engineering/
MIT License
461 stars 37 forks source link

truncated_output_suffixes & #32

Closed thistleknot closed 4 months ago

thistleknot commented 5 months ago
with open("data/all_truncated_outputs.json") as f:
    output_suffixes = json.load(f)
truncated_output_suffixes = [
    tokenizer.convert_tokens_to_string(tokens[:i])
    for tokens in (tokenizer.tokenize(s) for s in output_suffixes)
    for i in range(1, len(tokens))
]
truncated_output_suffixes_512 = [
    tokenizer.convert_tokens_to_string(tokens[:i])
    for tokens in (tokenizer.tokenize(s) for s in output_suffixes[:512])
    for i in range(1, len(tokens))
]

files referenced that do not exist in the repo for the mve

another ex is true_facts.json (did not find an example in the paper that mentioned facts or a .json file)

thistleknot commented 5 months ago

created a script that i think mimics what you were showcasing

https://gist.github.com/thistleknot/b936477ee82ce608b3c7f47381f6b15d

vgel commented 4 months ago

make sure you're running the notebook with cwd in the notebooks folder, the data folder is notebooks/data. alternatively you can just copy the data folder to wherever you need it (you can figure out the current cwd with import os; print(os.getcwd()) and copy the data folder there), it's pretty small.