jbloomAus / SAEDashboard

MIT License
13 stars 2 forks source link

feat: prepending/appending tokens for prompt template + feat mask via Position #30

Closed jbloomAus closed 2 weeks ago

jbloomAus commented 2 weeks ago

Sometimes we want to make dashboards with a prompt template such as

<bos><start_of_turn>user
<OWT><end_of_turn>
<start_of_turn>model

To achieve this, we need to both have the ability to append / prepend tokens to the text and we need to be able to ignore features firing by position, not just id. (eg: the token user is in the template but needs to be ignored to avoid any "user" related features being screwed up.

Example changes to runner config:

def test_neuronpedia_runner_prefix_suffix_it_model():

    NP_OUTPUT_FOLDER = "neuronpedia_outputs/test_masking"
    ACT_CACHE_FOLDER = "cached_activations"
    SAE_SET = "gpt2-small-res-jb"
    SAE_PATH = "blocks.0.hook_resid_pre"
    NUM_FEATURES_PER_BATCH = 2
    NUM_BATCHES = 2

    # delete output files if present
    os.system(f"rm -rf {NP_OUTPUT_FOLDER}")
    os.system(f"rm -rf {ACT_CACHE_FOLDER}")

    # # we make two batches of 2 features each
    cfg = NeuronpediaRunnerConfig(
        sae_set=SAE_SET,
        sae_path=SAE_PATH,
        np_set_name="res-jb",
        from_local_sae=False,
        outputs_dir=NP_OUTPUT_FOLDER,
        sparsity_threshold=1,
        n_prompts_total=5000,
        n_features_at_a_time=NUM_FEATURES_PER_BATCH,
        n_prompts_in_forward_pass=32,
        start_batch=0,
        end_batch=NUM_BATCHES - 1,
        use_wandb=True,
        shuffle_tokens=False,
        prefix_tokens=[106, 1645, 108],
        suffix_tokens=[107, 108],
        ignore_positions=[0, 1, 2],
    )

    runner = NeuronpediaRunner(cfg)
    runner.run()

    assert "run_settings.json" in os.listdir(runner.cfg.outputs_dir)