To achieve this, we need to both have the ability to append / prepend tokens to the text and we need to be able to ignore features firing by position, not just id. (eg: the token user is in the template but needs to be ignored to avoid any "user" related features being screwed up.
Example changes to runner config:
def test_neuronpedia_runner_prefix_suffix_it_model():
NP_OUTPUT_FOLDER = "neuronpedia_outputs/test_masking"
ACT_CACHE_FOLDER = "cached_activations"
SAE_SET = "gpt2-small-res-jb"
SAE_PATH = "blocks.0.hook_resid_pre"
NUM_FEATURES_PER_BATCH = 2
NUM_BATCHES = 2
# delete output files if present
os.system(f"rm -rf {NP_OUTPUT_FOLDER}")
os.system(f"rm -rf {ACT_CACHE_FOLDER}")
# # we make two batches of 2 features each
cfg = NeuronpediaRunnerConfig(
sae_set=SAE_SET,
sae_path=SAE_PATH,
np_set_name="res-jb",
from_local_sae=False,
outputs_dir=NP_OUTPUT_FOLDER,
sparsity_threshold=1,
n_prompts_total=5000,
n_features_at_a_time=NUM_FEATURES_PER_BATCH,
n_prompts_in_forward_pass=32,
start_batch=0,
end_batch=NUM_BATCHES - 1,
use_wandb=True,
shuffle_tokens=False,
prefix_tokens=[106, 1645, 108],
suffix_tokens=[107, 108],
ignore_positions=[0, 1, 2],
)
runner = NeuronpediaRunner(cfg)
runner.run()
assert "run_settings.json" in os.listdir(runner.cfg.outputs_dir)
Sometimes we want to make dashboards with a prompt template such as
To achieve this, we need to both have the ability to append / prepend tokens to the text and we need to be able to ignore features firing by position, not just id. (eg: the token user is in the template but needs to be ignored to avoid any "user" related features being screwed up.
Example changes to runner config: