stanfordnlp / pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
http://pyvene.ai
Apache License 2.0
609 stars 59 forks source link

[P2] Refactor the AlignableConfig to take in intervention not as type but actual instance #20

Closed frankaging closed 9 months ago

frankaging commented 10 months ago

Description: Currently, we have,

class AlignableConfig(PretrainedConfig):
    def __init__(
        self,
        alignable_model_type="gpt2",
        alignable_representations=[
            # we do distributed search over elements in the sublist.
            AlignableRepresentationConfig()
        ],
        alignable_interventions_type=VanillaIntervention,
        alignable_low_rank_dimension=None,
        mode="parallel",
        **kwargs
    ):

We need to specify the type as a class type, not as an instance. This causes trouble. It is better to get alignable interventions as a list of actual instances, e.g., alignable_interventions = [VanillaIntervention()]. This will allow us to have more specifications for customizable interventions.