stanfordnlp / pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
http://pyvene.ai
Apache License 2.0
559 stars 50 forks source link

[Bug Fix] Topological order scoring is not accurate #75

Closed frankaging closed 6 months ago

frankaging commented 6 months ago

Descriptions:

Currently sorting thing does not work for path patching configs such as,

def path_patching_config(
    layer, stream="head_attention_value_output", num_layers=gpt.config.n_layer
):
    intervening_component = [
        IntervenableRepresentationConfig(layer, stream, "h.pos", group_key=0)
    ]
    restoring_components = []
    if not stream.startswith("mlp_"):
        restoring_components += [
            IntervenableRepresentationConfig(layer, "mlp_output", group_key=1)
        ]
    for i in range(layer+1, num_layers):
        restoring_components += [
            IntervenableRepresentationConfig(i, "attention_output", group_key=1),
            IntervenableRepresentationConfig(i, "mlp_output", group_key=1)
        ]
    intervenable_config = IntervenableConfig(
        intervenable_representations=intervening_component + restoring_components,
        intervenable_interventions_type=VanillaIntervention,
    )
    return intervenable_config

intervenable_config = path_patching_config(
    layer=4, 
    stream="head_attention_value_output"
)
intervenable = IntervenableModel(intervenable_config, gpt)